Description
In this project we will study a social network and graphs of users’ personal friendship
network. We will explore community structures in the friendship network and their
interpretation and applications. All datasets are available online 1
.
Submission: Please submit a zip file containing your codes and report to
ee232e.spring2016@gmail.com . The zip file should be named as
“Project1_UID1_UID2_…_UIDn.zip” where UIDx are student ID numbers of team
members. If you had any questions you can send an email to the same address.
1. Download the Facebook graph edgelist file facebook_combined.txt 2
. Is
the network connected? Measure the diameter of the network. Plot the degree
distribution and try to fit a curve on it. What is your curve’s total mean squared
error? What is the average degree?
2. Take the first node in the graph (The node whose ID is 1) and find its neighbors.
Create a graph that consists of node 1 and its neighbors and the edges that have
both ends within this set of nodes. We will call this the personal network of node
1. Note that the common characteristic among the nodes in this graph -except
for node 1- is that they are all friends of node 1, or equivalently, node 1 is a
mutual friend of all of them. How many nodes and edges does this graph have?
3. Find nodes in the graph that have more than 200 neighbors (we will call these
core nodes). How many core nodes do you find in the network? What is the
average degree of these core nodes? For one of these nodes, find the community
structure of the core’s personal network. Plot the network and try to distinguish
communities with color. Use Fast-Greedy, Edge-Betweenness, and Infomap community detection algorithms in igraph and compare results.
1 http://snap.stanford.edu/data
2http://snap.stanford.edu/data/egonets-Facebook.html
1
An example of a personal network. The core node is shown in black.
4. Try removing the core node itself from its personal network and running the
above community detection algorithms again. Are there any differences in the
results?
5. Find dispersion and embeddedness for all nodes in the personal network. Embeddedness is the number of mutual friends a node shares with the core node
and dispersion is the sum of distances between every pair of the mutual friends
a node shares with core node (These distances should be calculated in a modified graph where the said node and the core node are removed. In other words,
between every pair of mutual friends, we consider the shortest path that does
not pass through the said node and the core node. Read details in the paper referred 3
). Plot the distribution of embeddedness and dispersion over all personal
networks created with core nodes from part 3. Plot 3 personal networks showing
their community structure with colors and highlight the node with maximum
dispersion in each network. Highlight the edges incident to this node as well.
On each network, do the same thing for the node with maximum embeddedness
and the node with maximum dispersion
embeddedness. Can you show and explain what
characteristic of a node is revealed by each of these measures?
3
“Romantic Partnerships and the Dispersion of Social Ties: A Network Analysis of Relationship
Status on Facebook”, Lars Backstrom, Jon Kleinberg.
http://arxiv.org/abs/1310.6753
2
6. The communities in personal network can translate into different aspects of
one’s life. These can be friends from high school, college friends, colleagues,
etc. Certain types of communities are present in almost every individual’s personal network. Can you find structural features for each community that help
you map communities across different people’s personal network? That is, come
up with a way that determines a community in one user’s personal network is
equivalent to another community in another users’s personal network. You may
not know the nature of the community in both networks (it could for example
be “college friends” of each of the two users), but try to show similarities among
both communities in the two networks that back up your decision. Run your
algorithm across all personal networks you extracted and specify two types of
communities you believe are recurring across all of them (along with the features you have calculated for each). Then, on each network you identify a community of type 1 and a community of type 2. You can only consider communities
with size larger than 10 nodes from each personal network and ignore the rest.
You may consider features like Modularity Index, Clustering Coefficient, Density, Community size, or any other statistical feature of the community.
7. Now we try to run the same kind of analysis on another real social network with
tagged relationships. Download the Google+ ego networks file gplus.tar.gz
4
. Create personal network for users who have more than 2 circles (which is the
default number). Extract the community structure of each personal network using both Walktrap and Infomap algorithms and show how communities overlap
with the user’s circles. How do these overlaps vary across users? How does this
relate to a user’s habit on tagging relationships with circles?
Notes: Google+ ,unlike Facebook, has a directed network structure, where you
can have someone in your circles regardless of whether they have you in their
circles or they don’t. Circles are tags you put on your relationships when you
add people. E.g. you can have two circles, one named “friends” and the other one
named “family”, and when you add someone you can put them in one or both of
these circles. You can refer to the dataset’s readme file to see how it is organized.
Notice that the ego node (core node) is not available in the edgelist stored for its
personal network and you should manually add it.
4http://snap.stanford.edu/data/egonets-Gplus.html
3