Parallel Matrix Multiplication [50 points]
• The matrix size is 4K × 4K. Print out the execution time and the value of C
in your program.
• Report the execution time for different number of threads: p = 1 (the serial version
you did in PA1)2 and 4. Draw a diagram to show the execution time under various p.
An example diagram is shown in Figure 1. Briefly discuss your observation.
Parallelize the naive matrix multiplication which you implemented in PA 1 using OpenMP.
Parallelize the naive matrix multiplication which you implemented in PA 1 using Pthread.
1 4 16 64 256
# of threads
Execution time (s) 1
Figure 1: Example diagram
• Pthread example code, “print msg with join.c”, which prints hello message in each
thread. To compile it, type: ‘gcc -O3 -o run print msg with join.c.c -lpthread’.
• OpenMP example code. “openmp example.c” which implements parallel matrix-vector
multiplication. To comple it, type “gcc -O3 -o run openmp example.c -fopenmp”.
• In order to run your parallel programs on multiple CPUs, you should use this slurm
command to submit your program: “srun -n1 -c8 ./run”. Here -c specifies the number
of CPUs (threads) you will be using.
• Naive matrix-matrix multiplication code from your PA1.
3 Submission Instructions
• Two .c/.cpp files: ‘problem1a.c’ for OpenMP; ‘problem1b.c’ for PThread. Make sure
your program is runnable. (10+20 pts)
• Report. Write clearly how to compile and run your code. Screenshot of the execution
time and performance on hpc. (20 pts)
You may discuss the algorithms. However, the programs have to be written individually.
Submit the code and the report via email@example.com. Please make sure to include
your name, student ID and the homework number in the PDF, and name your PDF file
lastname firstname pa#.
 “Command Line Parameter Parsing,”