In this notebook we will cover some community detection algorithms implemented in the community module of NetworKit. Community detection is concerned with identifying groups of nodes which are significantly more densely connected to each other than to the rest of the network. As a first step we import NetworKit:
[1]:
import networkit as nk
The community module provides a top-level function, detectCommunities(G, algo=None, inspect=True) to perform community detection of a given graph with a suitable algorithm, and print some statistics about the result. If no algorithm is specified via the algo parameter, community detection is performed using the
PLM algorithm.
This function can be used as follows:
[2]:
# Read graph
G = nk.readGraph("../input/karate.graph", nk.Format.METIS)
communities = nk.community.detectCommunities(G)
Communities detected in 0.00044 [s]
solution properties:
------------------- ---------
# communities 4
min community size 5
max community size 12
avg. community size 8.5
imbalance 1.33333
edge cut 21
edge cut (portion) 0.269231
modularity 0.41979
------------------- ---------
The following sections cover two popular community detection algorithms, PLM and PLP, and will illustrate how to use them.
NetworKit provides a parallel implementation of the well-known Louvain method, which can be found in the PLM class. It yields a high-quality solution at reasonably fast running times. The constructor PLM(Graph, refine=False, gamma=0.1, par='balance', maxIter=32, turbo=True, recurse=True) expects a
networkit.Graph as a mandatory parameter. If the parameter refine is set to true, the algorithm performs a second move phase to refine the communities. The parameter gamma defines the multi-resolution modularity parameter. The string par defines the openmp parallelization strategy. maxIter is the maximum number of iterations for move phase. When turbo is set to true, the
algorithm is faster but uses O(n) additional memory per thread. Set recurseto true in order to use recursive coarsening. Refer to this for more details on recursive coarsening.
In the example below we run PLM with refine set to true while leaving the rest of the parameters to their default values.”
[3]:
# Choose and initialize algorithm
plmCommunities = nk.community.detectCommunities(G, algo=nk.community.PLM(G, True))
Communities detected in 0.00042 [s]
solution properties:
------------------- ---------
# communities 4
min community size 5
max community size 12
avg. community size 8.5
imbalance 1.33333
edge cut 21
edge cut (portion) 0.269231
modularity 0.41979
------------------- ---------
The output of the detectCommunities function is a partition of the nodes of the graph. It is represented by the Partition data structure, which provides several methods for inspecting and manipulating a partition of a set of elements.
[4]:
print("{0} elements assigned to {1} subsets".format(plmCommunities.numberOfElements(),
plmCommunities.numberOfSubsets()))
34 elements assigned to 4 subsets
[5]:
print("the biggest subset has size {0}".format(max(plmCommunities.subsetSizes())))
the biggest subset has size 12
The contents of a partition object can be written to file in a simple format, in which the i-th line contains an integer representing the subset id of node i.
[6]:
nk.community.writeCommunities(plmCommunities, "output/communtiesPLM.partition")
wrote communities to: output/communtiesPLM.partition
The Label Propagation algorithm is an algorithm for finding communities in a graph. NetworKit provides a parallel implementation, PLP(G, updateThreshold=none, maxIterations=none). The constructor expects a networkit.Graph as a mandatory parameter. The parameter updateThreshold dictates
the number of nodes that have to be changed in each iteration so that a new iteration starts, and maxIterations is the maximum number of iterations. none is NetworKit constant set to the maximum value of a 64-bit integer.
[7]:
# Read graph
G = nk.readGraph("../input/jazz.graph", nk.Format.METIS)
[8]:
# Choose and initialize algorithm
plpCommunities = nk.community.detectCommunities(G, algo=nk.community.PLP(G))
Communities detected in 0.00327 [s]
solution properties:
------------------- -----------
# communities 2
min community size 57
max community size 141
avg. community size 99
imbalance 1.42424
edge cut 148
edge cut (portion) 0.0539752
modularity 0.278746
------------------- -----------
[9]:
print("{0} elements assigned to {1} subsets".format(plpCommunities.numberOfElements(),
plpCommunities.numberOfSubsets()))
198 elements assigned to 2 subsets
[10]:
print("the biggest subset has size {0}".format(max(plpCommunities.subsetSizes())))
the biggest subset has size 141
[11]:
nk.community.writeCommunities(plpCommunities, "output/communtiesPLP.partition")
wrote communities to: output/communtiesPLP.partition