## Description

In this assignment, we will study a real network. The data is available on this Dropbox Link 1

. We are going to study the properties of

this real network. The data is given as a directed edge list format,

where each line has three items: node1, node2 and the weight of the

edge from node1 to node2.

1. Is this network connected? If not, find out the giant connected

component. And in the following, we will deal with this giant

connected component.

2. Measure the degree distribution of in-degree and out-degree of

the nodes.

3. We would like to measure the community structure of the network. First, we need to convert it into an undirected network.

Then we use fastgreedy.community and label.propagation.community

to measure the community structure. Are the results of these two

methods similar or not?

1https://www.dropbox.com/s/oybns7fvztfy76q/sorted_directed_

net.txt?dl=0

1

Note that this network has edge weights, so it is not trivial to

convert it from directed to undirected, especially if there are two

directed edges between node i and node j. We at least have two

options.

In option 1, we keep the number of edges unchanged, and just

remove the directions. The resulting undirected network is not

simple. label.propagation.community command can be used

to compute a weighted, undirected, non-simple network’s community structure.

In option 2, we merge the two directed edges between i and j so

that the resulting network is simple. Suppose the weights are wij

and wji respectively, and we set the weight for the merged undirected edge to be √wijwji. We can use both fastgreedy.community

and label.propagation.community to measure the community structure.

4. Find the largest community computed from fastgreedy.community.

Isolate the community from other parts of the network to form a

new network, and then find the community structure of this new

network. This is the sub-community structure of the largest community.

5. Find all the sub-community structures of the communities with

sizes larger than 100.

6. Both fastgreedy.community and label.propagation.community

assume that each node belongs to only one community. In practice, a node can belong to two or more communities at the same

time. There is no command in igraph that can detect overlapped

communities. Here we are going to use personalized PageRank to

study the overlapped communities structures.

Suppose that we start the random walk from a node i with the

damping parameter 0.85 in the original directed network. We will

obtain the visiting probabilities of all nodes in the network. These

visiting probabilities reflect the relation between the other nodes

and node i. On the other hand, we already have some knowledge of the community structure obtained from the above ques2

tion. Therefore, we can compute

M~

i =

X

j

vj ~mj

,

where vj is the visiting probability of node j and ~mj is its community membership computed from 3. Here, the community membership is expressed as a vector so that it can be adapted to multiple memberships. Suppose you have found n communities in 3,

and ~mj is a n dimensional vector with only one element being 1.

The resulting M~

i is a measurement of the multiple memberships.

To include all nodes in the network in the sum might make the

computation amount too large. So you can take the largest 30 vj

to include in the sum.

A threshold is needed to remove the memberships that have very

small values in M~

i

. Choose a proper threshold.

Note that visit probabilities of the random walk started from a

node can be obtained by a personalized pagerank with a teleportation probability distribution that is 1 to the node under examination and 0 to every other node.

Try this method to see whether you can find examples of nodes

that belong to multiple communities. List at least 3 examples if

you find such nodes.

3