- Home
- CSCI 4320/6360
- CSCI-4320/6360 – Assignment 2: Parallel C Program for a Carry-Lookahead Adder Using MPI

$25.00

Category: CSCI 4320/6360

Description

5/5 - (5 votes)

1 Overview

Here, you are to re-use your serial Carry Lookahead Adder and adapt it to use 16 bit blocks

and extend it to work in parallel for a 1M (e.g., 1,048,576) bit CLA adder using up to 16

MPI ranks on the class server, kratos.cs.rpi.edu.

This CLA adder can be constructed from the following. Note, the description has been

updated to consider that the i-th carry at any level is really the carry-out from that bit position

and that i − 1 is the carry-in to the i-th bit, group or section, etc bit position.

Note, to make the CLA program run in serial and parallel using 2, 4, 8 and 16 MPI

ranks, you can think of “slicing” the CLA adder into 8 equal size chunks. So, you will need

to exchange messages at the s

3

cm level. That is each rank will have only one “bit” of carry

information at that level.

1. Calculate gi and pi

for all 1,048,576 i.

2. Calculate ggj and gpj

for all 65,536 j using gi and pi

.

3. Calculate sgk and spk for all 4,096 k using ggj and gpj

.

4. Calculate s

2

gl and s

2pl

for all 256 l using sgk and spk.

5. Calculate s

3

gm and s

3pm for all 16 m using s

2

gl and s

2pl

.

6. Calculate s

3

cm using s

3

gm and s

3pm for all m and correct s

3

co. Note, for 16 MPI

ranks, Rank 0 will send carry message updates to rank 1, and rank 1 to rank 2 and so

on up to 16. If you have fewer than 16 ranks, then each rank will perform the necessary

calculations multiple s

3

cm. For example at 8 ranks, there are 2 s

3

cm, at 4 ranks there

are 4 s

3

cm and at 2 ranks there are 8 s

3

cm and the serial processor case computes all

16 s

3

cm.

1

7. Calculate s

2

cl using s

2

gl and s

2pl

for all l and replacing each s

2

cl−1 when l mod 16 = 0

using correct s

3

cm−1, m = l/16 as super sectional carry-in for all l in that block of 16.

8. Calculate sck using sgk and spk for all k and replacing each sck−1 when k mod 16 = 0

using correct s

2

cl−1, l = k div 16 as sectional carry-in for all k in that block of 16.

Note, make sure you set sck−1 = s

2

cl−1 when k mod 16 = 0 and l = k/16.

9. Calculate gcj using ggj

, gpj

for all j and replacing each gcj−1 when j mod 16 = 0 using

correct sck−1, k = j/16 as sectional carry-in for all j in that block of 16. Note, make

sure you set gcj−1 = sck−1 when j mod 16 = 0 and j = i/16.

10. Calculate ci using gi

, pi

for all i and replacing each ci−1 when i mod 16 = 0 using

correct gcj−1. Note, make sure you set ci−1 = gcj−1 when i mod 16 = 0 and j = i/16

for the final sum step below.

11. Calculate sumi using ai ⊕ bi ⊕ ci−1 for all i.

More specifically, you will do the following:

1. Like before represent the 1,048,576 bit input numbers as 262,144 hex digits.

2. Have MPI Rank 0 perform the conversion from hex to binary and reverse the binary

input numbers such that the most significant bit is in the highest position within the

binary array.

3. Have MPI Rank 0 distribute the input binary arrays to each rank in the correct order

where (for a 16 ranks configuration) MPI rank 1 has bits 16,384 thru 32,767 and MPI

rank 2 has bits 32,768 thru 49,151 and so on. This should be done using MPI Scatter.

4. Like before, write functions for each step in the above algorithm. You can do it using

for loops and do not have to perform the equation substitutions by hand as we did in

class.

5. Between each algorithm step perform an MPI Barrier collective operation to keep all

ranks in step with each other. Make the barrier operation conditional as you will want

to turn it own and off as part of your performance study.

6. Use MPI Isend and MPI Irecv routines to exchange the nearest-neighbor carry bit

information at the s

3

cm level. Again, Rank 0 will send to rank 1 and rank 1 will

receive from 0 and send to rank 2 and so on. Note that Rank 0 will only send and rank

15 will only receive carry information. Note, all ranks except Rank 0, should

post an MPI Irecv message before the cla calculations start well in advance

of any MPI Isend messages being scheduled.

7. A master “cla” routine will run thru all the steps.

2

8. You main routine will invoke your input reading function, followed by your master

“cla” function. You can check your answer by having Rank 0 read in the full data set

hex input data (convert it to binary) and like before using the ripple carry adder based

on computing the ci = gi + pi ∗ ci−1 for all i and then computing the final sum, based

on sumi = ai ⊕ bi ⊕ ci−1.

9. Have each rank send there part of the final sumi solution to Rank 0. This should

be done using MPI Gather. Rank 0 will then re-reverse the sum and output the final

result. (Yes, this output result will consume several pages of text).

10. For the performance study, measure the execution of serial and parallel runs using

MPI Wtime which returns a double. Here you’ll have a start time and finish time

and their difference is the execution. Performance testing should be done on the class

server, kratos.cs.rpi.edu.

11. Next, create the following three graphs: First, plot the execution time of 2, 4, 8 and 16

rank runs as function of their number of ranks. Next, plot of the speedup relative to

the execution time of the serial MPI CLA adder of the 2, 4, and 8 rank runs and finally

plot the speedup of the relative to the execution time of the serial ripple carry adder

to the MPI CLA adder running in parallel on 2 thru 16 ranks. Each graph should

have two plots line – one with and one without the MPI Barrier operations between

each step. Explain why your performance graphs turned out they way they did. This

report must be written using LaTeX, MSWord other document preparation software

and handed in in PDF format using submitty along with your code.

12. Test case will be posted on the website and run using file redirection to standard input.

E.g., the command line that will be used is:

mpirun -np X ./cla < test1.txt

Note, that X is the number of ranks that will be used. We will run our tests on 4 or

8 ranks, depending on the test but again your performance tests should run on 2, 4, 8

and 16 MPI ranks using kratos.cs.rpi.edu.

2 HAND-IN INSTRUCTIONS

Submit both your code and PDF report to submitty.cs.rpi.edu.

3 LATE DAYS

This assignment will be eligible for you to use your “late days”. Each student has up to 3

late days they can use on individual programming assignments for the semester.

3

WhatsApp us