## Description

1. What are the two main styles of parallelism? Explain.

2. What are the three parallel programming paradigms? Explain.

3. Discuss the difference between shared address space machines and distributed

address space machines. And discuss the advantages and disadvantages of both

architectures.

4. Give an example of anti dependence and give a corresponding solution to remove

the dependence.

5. Consider the search tree shown in the following figure, in which the dark node

represents the solution.

(a) If a sequential search of the tree is performed using the standard depth-first

search (DFS) algorithm, how much time does it take to find the solution if

traversing each arc of the tree takes one unit of time? Note: DFS begins by

expanding the initial node and generating its successors. In each subsequent

step, DFS expands one of the most recently generated nodes. If this node has

no successors (or cannot lead to any solutions), then DFS backtracks and

expands a different node.

(b) Assume that the tree is partitioned between two processing elements that are

assigned to do the search job, as shown in figure b. If both processing

elements perform a DFS on their respective halves of the tree, how much time

does it take for the solution to be found? What is the speedup? Is there a

speedup anomaly? If so can you explain the anomaly?

2

6. Derive the formula for calculating the average access time for a word in a system

with three levels of cache. Assume the following values for a theoretical system

containing an L1, L2, and L3 cache.

Location Latency

——— ————-

L1 2 ns

L2 8 ns

L3 30 ns

Main Memory 100 ns

If an application has following hit rates, what is the average memory access time

for a memory word?

Location Hit rate

———— ———-

L1 40%

L2 70%

L3 90%

Main Memory 100%

7. Consider a parallel matrix vector multiplication algorithm:

C=A*b

14 1 2 3 1

32 4 5 6 2

50 7 8 9 3

é ù é ù éù

ê ú ê ú êú = ´

ë û ë û ëû

14 1 2 3

32 4 1 5 2 6 3

50 7 8 9

é ù éù éù éù

ê ú êú êú êú = ´ + ´ + ´ ê ú êú êú êú

ë û ëû ëû ëû

Sequential code:

matvec() {

int i,j;

for (j=0;j<n;j++){

for (i=0;i<m;i++) {

c[i]=c[i]+a[i][j]*b[j];

}

}

}

3

Using fixed-size speedup and scalability analysis techniques similar to those

employed in the class notes, answer the following.

(a) Analyze the running time of the serial algorithm as a function of the matrix

dimension n and m, you may assume all operations take unit time. (More

appropriate runtime order techniques will be presented later in the course –

“big O” notation.)

(b) Analyze the running time of the parallel algorithm as a function of n, m and p.

You may also assume n and m are evenly divisible by p.

(c) Obtain expression for the speedup, Sp, and the Amdahl’s fraction a.

(d) Determine if the algorithm is effective. Briefly explain.

Note on cheating: There are penalties for cheating. Don’t find out the hard way.

Working in groups is fine for discussing approaches and techniques. Copying

problem solutions or code is cheating. Both the person copying and the person

giving the answers will be equally penalized. Make sure you do your own work.

Parallel code:

void matvec() {

int i,j,nprocs,myid;

float tmp[max];

for (i=0;i<m;i++){

tmp[i]=0;

}

nprocs= no. of processors used;

myid=process id;

for (j=myid;j<n;j+=nprocs){

for (i=0;i<m;i++)

tmp[i]=tmp[i]+a[i][j]*b[j];

}

lock();

for (i=0;i<m;i++)

c[i]=c[i]+tmp[i];

unlock();

}