# CSE 6220 Programming Assignment 2 Jacobi’s Method

\$35.00

## Description

Jacobi’s method is an iterative, numerical method for solving a system of linear equations. Formally, given a full rank n × n matrix A ∈ R
n×n and a vector b ∈ R
n
, Jacobi’s method iteratively
approximates x ∈ R
n
for:
Ax = b
Given the matrix A,
A =

a11 a12 · · · a1n
a21 a22 · · · a2n
.
.
.
.
.
.
.
.
.
.
.
.
an1 an2 · · · ann

we first separate A into its diagonal elements D and the remaining elements R such that A = D+R
with:
D =

a11 0 · · · 0
0 a22 · · · 0
.
.
.
.
.
.
.
.
.
.
.
.
0 0 · · · ann

, R =

0 a12 · · · a1n
a21 0 · · · a2n
.
.
.
.
.
.
.
.
.
.
.
.
an1 an2 · · · 0

Jacobi’s method then follows the following steps till convergence or termination:
1. Initialize x: x ←
h
0 0 . . . 0
i
2. D = diag(A)
3. R = A − D
4. while ||Ax − b|| > l do
(a) update x ← D−1
(b − Rx)
where ||x|| is the L2-norm, and l is a parameter for the termination accuracy.
A matrix A is diagonally dominant, if its diagonal elements are larger than the sum of absolutes
of the corresponding rows, i.e., iff:
|aii| >
X
j6=i
|aij |.
It can be shown that for diagonally dominant matrices, Jacobi’s method is guaranteed to converge.
1
Parallel Algorithm
Data distribution We use a 2-dimensional grid as the communication network for this problem.
We assume that the number of processors is a perfect square: p = q ×q with q =
√p, arranged into
a mesh of size q × q.
The inputs are distributed on the grid as follows:
• The n-by-n matrix A is block distributed onto the grid. The rows of A are distributed onto
the rows of the processor-grid such that either d
n
q
e or b
n
q
c rows of A are distributed onto
each row of the grid. More specifically, the first (n mod q) rows of the grid contain d
n
q
e rows
of A, and the remaining rows of the grid contain b
n
q
c rows of A. The same applies to the
distribution of columns. A processor with coordinates (i, j) thus has the following size local
matrix:



d
n
q
e
2
if i < (n mod q) and j < (n mod q)
d
n
q
e × bn
q
c if i < (n mod q) and j ≥ (n mod q)
b
n
q
c × dn
q
e if i ≥ (n mod q) and j < (n mod q)
b
n
q
c
2
if i ≥ (n mod q) and j ≥ (n mod q)
• The size n vector b and x are equally block distributed only along the first column of the
processor-grid, i.e., only among processors with indexes (i, 0). Processor (i, 0) will thus have
the following local size:

d
n
q
e if i < (n mod q)
b
n
q
c if i ≥ (n mod q)
Parallel Matrix-Vector Multiplication Let A be a n-by-n matrix and let x and y be n
dimensional vectors. We wish to compute y = Ax. Assume that the square matrix A and the
vector x are distributed among the processor grid as explained above. The final answer y will again
be distributed in the same fashion as the vector x was. To do so, we will first “transpose” the
vector x on the grid, such that a processor with index (i, j) will end up with the elements that were
on processor (j, 0) according to the above distribution. We do this by first sending the elements
from a processor (i, 0) to its corresponding diagonal processor (i, i), using a single MPI Send and
the sub-communicator for the row of processors. We then broadcast the received elements from
(i, i) along each column of the grid using a sub-communicator for the column of processors.
Now each processor has the elements it needs for its local matrix vector multiplication. We
multiply the local vector with the local matrix and then use a parallel reduction to sum up the
resulting vectors along the rows of the grid back onto the processors of the first column. The final
result of the multiplication thus ends up distributed among the first column in the same way that
the input x was.
Parallel Jacobi For parallelizing Jacobi’s method, we distribute A and R as described above and
use parallel matrix-vector multiplication for calculating Rx. The vectors x, and b, and the diagonal
elements D of A are distributed among the first column of the processor grid. We can thus update
x by xi ← 1
di
(bi − (Rx)i) using only local operations.
In order to detect termination, we calculate Ax using parallel matrix vector multiplication, and
then subtract b locally on the first column of processors (where the result of the multiplication and
b are located). Next, we calculate the L2-norm ||Ax − b|| by first calculating the sum of squares
locally and then performing a parallel reduction along the first column of processors. We can now
determine whether to terminate or not by checking the L2-norm against the termination criteria and
then broadcasting the result along rows. Now every processor knows whether or not to continue
with further iterations. In your implementation, you should not only check for the termination
criteria, but also terminate after a pre-set maximum number of iterations.
Assignment
In this assignment, you will implement Jacobi’s method using MPI. A framework is provided
with this programming assignment, which declares all functions that you should implement. It
furthermore provides a testing framework, which you can use to test your code for correctness. See
the README file in the framework for more details regarding the framework.
The framework is available on Georgia Tech’s Github installation at: https://github.gatech.
edu/pflick3/cse6220-prog2.
First, you should implement matrix-vector multiplication and Jacobi’s method sequentially.
You will later use this as the comparison for measuring speedup and as reference implementation
Implement the sequential code in the file jacobi.cpp according to the function declarations in
jacobi.h. Unit Tests for the sequential code are implemented in seq tests.cpp. Add your own
test cases in this file. To compile and run the tests, run make test either locally on your own
machine, or in an interactive job. DO NOT run the tests on the login node of the cluster.
Next, you should implement the parallel algorithm as described above. For this, you’ll have
to implement functions to distribute the data among the grid of processors and gather results
back to the processor with rank (0, 0). Then, implement the parallel algorithms for matrix-vector
multiplication and Jacobi’s method. Implement your parallel code in the file mpi jacobi.cpp
according to the function declarations in mpi jacobi.h. A few test cases are already provided
in the file mpi tests.cpp. You should add your own test cases in this file as well. Some utility
functions for block distributions are provided in utils.h and utils.cpp. You can make use of
these function, and implement your own utility functions in these files.
Finally, test the performance of your parallel implementation against your sequential implementation for various input sizes. Use the executable jacobi for this and generate the input
on-the-fly via the -n and -d parameters. Report your achieved speedups for different input sizes
and difficulties. Increased difficulty increases the number of iterations needed for Jacobi’s Method
to converge.
Submission guidelines
The assignment is due on March 31st, 2015 on T-Square. A framework is provided along with
the assignment for you to work with. Please refer to the README file provided in the framework
for information on how to use it. You are expected to work in teams of two. One member from
each team should submit a zip/tar file containing the following
1. A text file containing the names of all team members and their contribution in terms of
percentage of work done.
2. All source files. Your program should be well commented and easy to read.
3. A report in PDF format containing the following:
• Short design description of your algorithms.
• Runtime and speedup plots by varying problem size. Give observations on how your
algorithm behaves on varying these parameters.