In this notebook we will cover some community detection algorithms implemented in the community
module of NetworKit. Community detection is concerned with identifying groups of nodes which are significantly more densely connected to each other than to the rest of the network. As a first step we import NetworKit:
[1]:
import networkit as nk
The community
module provides a top-level function, detectCommunities(G, algo=None, inspect=True) to perform community detection of a given graph with a suitable algorithm, and print some statistics about the result. If no algorithm is specified via the algo
parameter, community detection is performed using the
PLM algorithm.
This function can be used as follows:
[2]:
# Read graph
G = nk.readGraph("../input/karate.graph", nk.Format.METIS)
communities = nk.community.detectCommunities(G)
Communities detected in 0.00057 [s]
solution properties:
------------------- ---------
# communities 4
min community size 5
max community size 12
avg. community size 8.5
imbalance 1.33333
edge cut 21
edge cut (portion) 0.269231
modularity 0.41979
------------------- ---------
The following sections cover two popular community detection algorithms, PLM
and PLP
, and will illustrate how to use them.
NetworKit provides a parallel implementation of the well-known Louvain method, which can be found in the PLM class. It yields a high-quality solution at reasonably fast running times. The constructor PLM(Graph, refine=False, gamma=0.1, par='balance', maxIter=32, turbo=True, recurse=True)
expects a
networkit.Graph as a mandatory parameter. If the parameter refine
is set to true, the algorithm performs a second move phase to refine the communities. The parameter gamma
defines the multi-resolution modularity parameter. The string par
defines the openmp parallelization strategy. maxIter
is the maximum number of iterations for move phase. When turbo
is set to true, the
algorithm is faster but uses O(n) additional memory per thread. Set recurse
to true in order to use recursive coarsening. Refer to this for more details on recursive coarsening.
In the example below we run PLM with refine
set to true while leaving the rest of the parameters to their default values.”
[3]:
# Choose and initialize algorithm
plmCommunities = nk.community.detectCommunities(G, algo=nk.community.PLM(G, True))
Communities detected in 0.00043 [s]
solution properties:
------------------- ---------
# communities 4
min community size 5
max community size 12
avg. community size 8.5
imbalance 1.33333
edge cut 21
edge cut (portion) 0.269231
modularity 0.41979
------------------- ---------
The output of the detectCommunities
function is a partition of the nodes of the graph. It is represented by the Partition data structure, which provides several methods for inspecting and manipulating a partition of a set of elements.
[4]:
print("{0} elements assigned to {1} subsets".format(plmCommunities.numberOfElements(),
plmCommunities.numberOfSubsets()))
34 elements assigned to 4 subsets
[5]:
print("the biggest subset has size {0}".format(max(plmCommunities.subsetSizes())))
the biggest subset has size 12
The contents of a partition object can be written to file in a simple format, in which the i
-th line contains an integer representing the subset id of node i
.
[6]:
nk.community.writeCommunities(plmCommunities, "output/communtiesPLM.partition")
wrote communities to: output/communtiesPLM.partition
The Label Propagation algorithm is an algorithm for finding communities in a graph. NetworKit provides a parallel implementation, PLP(G, updateThreshold=none, maxIterations=none). The constructor expects a networkit.Graph as a mandatory parameter. The parameter updateThreshold
dictates
the number of nodes that have to be changed in each iteration so that a new iteration starts, and maxIterations
is the maximum number of iterations. none
is NetworKit constant set to the maximum value of a 64-bit integer.
[7]:
# Read graph
G = nk.readGraph("../input/jazz.graph", nk.Format.METIS)
[8]:
# Choose and initialize algorithm
plpCommunities = nk.community.detectCommunities(G, algo=nk.community.PLP(G))
Communities detected in 0.00331 [s]
solution properties:
------------------- -----------
# communities 2
min community size 57
max community size 141
avg. community size 99
imbalance 1.42424
edge cut 148
edge cut (portion) 0.0539752
modularity 0.278746
------------------- -----------
[9]:
print("{0} elements assigned to {1} subsets".format(plpCommunities.numberOfElements(),
plpCommunities.numberOfSubsets()))
198 elements assigned to 2 subsets
[10]:
print("the biggest subset has size {0}".format(max(plpCommunities.subsetSizes())))
the biggest subset has size 141
[11]:
nk.community.writeCommunities(plpCommunities, "output/communtiesPLP.partition")
wrote communities to: output/communtiesPLP.partition