Community Detection with NetworKit

In this notebook we will cover some community detection algorithms implemented in the community module of NetworKit. Community detection is concerned with identifying groups of nodes which are significantly more densely connected to each other than to the rest of the network. As a first step we import NetworKit:

[1]:
import networkit as nk

The community module provides a top-level function, detectCommunities(G, algo=None, inspect=True) to perform community detection of a given graph with a suitable algorithm, and print some statistics about the result. If no algorithm is specified via the algo parameter, community detection is performed using the PLM algorithm.

This function can be used as follows:

[2]:
# Read graph
G = nk.readGraph("../input/karate.graph", nk.Format.METIS)
communities = nk.community.detectCommunities(G)
Communities detected in 0.00047 [s]
solution properties:
-------------------  ---------
# communities         4
min community size    5
max community size   12
avg. community size   8.5
imbalance             1.33333
edge cut             21
edge cut (portion)    0.269231
modularity            0.41979
-------------------  ---------

The following sections cover two popular community detection algorithms, PLM and PLP, and will illustrate how to use them.

PLM

NetworKit provides a parallel implementation of the well-known Louvain method, which can be found in the PLM class. It yields a high-quality solution at reasonably fast running times. The constructor PLM(Graph, refine=False, gamma=0.1, par='balance', maxIter=32, turbo=True, recurse=True) expects a networkit.Graph as a mandatory parameter. If the parameter refine is set to true, the algorithm performs a second move phase to refine the communities. The parameter gamma defines the multi-resolution modularity parameter. The string par defines the openmp parallelization strategy. maxIter is the maximum number of iterations for move phase. When turbo is set to true, the algorithm is faster but uses O(n) additional memory per thread. Set recurseto true in order to use recursive coarsening. Refer to this for more details on recursive coarsening.

In the example below we run PLM with refine set to true while leaving the rest of the parameters to their default values.”

[3]:
# Choose and initialize algorithm
plmCommunities = nk.community.detectCommunities(G, algo=nk.community.PLM(G, True))
Communities detected in 0.00037 [s]
solution properties:
-------------------  ---------
# communities         4
min community size    5
max community size   12
avg. community size   8.5
imbalance             1.33333
edge cut             21
edge cut (portion)    0.269231
modularity            0.41979
-------------------  ---------

The output of the detectCommunities function is a partition of the nodes of the graph. It is represented by the Partition data structure, which provides several methods for inspecting and manipulating a partition of a set of elements.

[4]:
print("{0} elements assigned to {1} subsets".format(plmCommunities.numberOfElements(),
                                                    plmCommunities.numberOfSubsets()))
34 elements assigned to 4 subsets
[5]:
print("the biggest subset has size {0}".format(max(plmCommunities.subsetSizes())))
the biggest subset has size 12

The contents of a partition object can be written to file in a simple format, in which the i-th line contains an integer representing the subset id of node i.

[6]:
nk.community.writeCommunities(plmCommunities, "output/communtiesPLM.partition")
wrote communities to: output/communtiesPLM.partition

PLP

The Label Propagation algorithm is an algorithm for finding communities in a graph. NetworKit provides a parallel implementation, PLP(G, updateThreshold=none, maxIterations=none). The constructor expects a networkit.Graph as a mandatory parameter. The parameter updateThreshold dictates the number of nodes that have to be changed in each iteration so that a new iteration starts, and maxIterations is the maximum number of iterations. none is NetworKit constant set to the maximum value of a 64-bit integer.

[7]:
# Read graph
G = nk.readGraph("../input/jazz.graph", nk.Format.METIS)
[8]:
# Choose and initialize algorithm
plpCommunities = nk.community.detectCommunities(G, algo=nk.community.PLP(G))
Communities detected in 0.00296 [s]
solution properties:
-------------------  -----------
# communities          2
min community size    59
max community size   139
avg. community size   99
imbalance              1.40404
edge cut             146
edge cut (portion)     0.0532458
modularity             0.281997
-------------------  -----------
[9]:
print("{0} elements assigned to {1} subsets".format(plpCommunities.numberOfElements(),
                                                    plpCommunities.numberOfSubsets()))
198 elements assigned to 2 subsets
[10]:
print("the biggest subset has size {0}".format(max(plpCommunities.subsetSizes())))
the biggest subset has size 139
[11]:
nk.community.writeCommunities(plpCommunities, "output/communtiesPLP.partition")
wrote communities to: output/communtiesPLP.partition