This notebook covers some of the centrality algorithms implemented in NetworKit. Centrality measures the relative importance of a node within a graph. Code for centrality analysis in NetworKit is grouped into the centrality module.
As always, the first step is to start by importing NetworKit.
[1]:
import networkit as nk
Every algorithm within the centrality
module in NetworKit has a run()
function that must be called in order for the algorithm to be executed after initialization.
Betweenness centrality measures the extent to which a vertex lies on shortest paths between other vertices. Vertices with high betweenness may have considerable influence within a network.
The fastest algorithm for the exact computation of the betweenness centrality is Brandes’ algorithm, and it is implemented in the Betweenness class. The constructor Betweenness(G, normalized=False, computeEdgeCentrality=False) expects a mandatory
networkit.Graph, and two optional parameters normalized
and computeEdgeCentrality
. If normalized
is set to true the centrality scores will be normalized. The parameter computeEdgeCentrality
should be set to true if edge betweeness should be computed as well.
We start by reading a graph, and then inititalising the algorithm with the parameters we want. Assuming we shall use the default parameters, i.e., centrality scores should not be normalized and edge betweeness should not be computed, we only need to pass a graph to the method.
[2]:
# Read graph
G = nk.readGraph("../input/power.graph", nk.Format.METIS)
# Initalize algorithm
btwn = nk.centrality.Betweenness(G)
[3]:
# Run
btwn.run()
[3]:
<networkit.centrality.Betweenness at 0x7fc8656a71f0>
[4]:
# The 10 most central nodes according to betweenness are then
btwn.ranking()[:10]
[4]:
[(4164, 7036954.687164464),
(2543, 6873056.733431248),
(1243, 6824187.83796623),
(4219, 6774285.210945546),
(2528, 6521871.002117607),
(1267, 6057480.123171425),
(1308, 5770690.329252721),
(1244, 5007410.432800253),
(426, 5000602.048322913),
(2606, 4955764.6544836275)]
Since exact calculation of betweenness scores is often out of reach, NetworKit provides an approximation algorithm based on path sampling. This functionality is provided by the ApproxBetweenness class, (G, epsilon=0.01, delta=0.1, universalConstant=1.0). It expects a mandatory undirected
networkit.Graph. Here we estimate betweenness centrality in PGPgiantcompo
, with a guarantee that the error is no larger than an additive constant epsilon
with a probability of at least 1 - delta
. The universalConstant
is used to compute the sample size.
[5]:
# Read graph
G = nk.readGraph("../input/PGPgiantcompo.graph", nk.Format.METIS)
[6]:
ab = nk.centrality.ApproxBetweenness(G, epsilon=0.1)
ab.run()
[6]:
<networkit.centrality.ApproxBetweenness at 0x7fc864ad5480>
[7]:
# The 10 most central nodes according to betweenness are then
ab.ranking()[:10]
[7]:
[(1143, 0.12996389891696727),
(6655, 0.09265944645006005),
(6555, 0.09025270758122733),
(7297, 0.08784596871239461),
(6932, 0.07581227436823101),
(3156, 0.07099879663056557),
(6744, 0.06016847172081833),
(604, 0.054151624548736496),
(7324, 0.054151624548736496),
(6098, 0.05174488567990376)]
The EstimateBetweenness class estimates the betweenness centrality according to the algorthm described in Sanders, Geisberger, Schultes: Better Approximation of Betweenness Centrality
. Despite the algorithm performing well in practice, no guarantee can be given. If a theoritical guarantee is required, use the
ApproxBetweenness algorithm.
The constructor EstimateBetweenness(Graph G, count nSamples, bool normalized=False, bool parallel_flag=False)
expects a networkit.Graph and the number of samples as mandatory parameters. Further, the centrality values can be optionally normalized by setting normalized
to True
; by default the centrality values are not normalized. This algorithm can be run in parallel by setting the
parallel_flag
to true. Running in parallel, however, comes with an additional cost in memory of z + 3z + t.
[8]:
# Read a graph
G = nk.readGraph("../input/PGPgiantcompo.graph", nk.Format.METIS)
[9]:
# Initialize algorithm
est = nk.centrality.EstimateBetweenness(G, 50, True, False)
est.run()
[9]:
<networkit.centrality.EstimateBetweenness at 0x7fc864ad5a80>
[10]:
#The 10 most central nodes according to betweenness are then
est.ranking()[:10]
[10]:
[(1143, 0.12347160379472265),
(6655, 0.09764491017533651),
(6555, 0.09357648028116852),
(7297, 0.09144160950529033),
(6932, 0.07746154348643391),
(3156, 0.07724545651472135),
(5848, 0.06793704108797427),
(6744, 0.06591125501443688),
(5165, 0.06286833450206764),
(7324, 0.05228211250197769)]
KadabraBetweenness is an ADaptive Algorithm for Betweennes via Random Approximation presented by Borassi M., and Natale E.. NetworKit provides variants of the algorithm that either reduce this memory consumption to O(1) or ensure that deterministic results are obtained. For more details about the implementation, see this
paper by Van der Grinten A., Angriman E., and Meyerhenke H. (2019)
.
The constructor KadabraBetweennes(G, err=0.01, delta=0.1, deterministic=False, k=0, unionSample=0, startFactor=100) only expects a networkit.Graph as a mandatory parameter. err
is the maximum additive error guaranteeded when approximating the betweenness
centrality of all nodes, while delta
is the probability that the values of the betweenness centrality are within the error guarantee. The parameter k
expresses the number of top k-nodes to be computed; if set to zero the approximate betweenness centrality of all nodes will be computed. unionSample
and startFactor
are algorithm parameters that are automatically chosen.
Hence, computing the KadabraBetweenness for all nodes with a maximum additive error of 0.05, and a probabilty of 0.8 that the values of the betweenness centrality are within that range can be done as follows:
[11]:
# Read a graph
G = nk.readGraph("../input/PGPgiantcompo.graph", nk.Format.METIS)
[12]:
# Initialize algorithm
kadabra = nk.centrality.KadabraBetweenness(G, 0.05, 0.8)
kadabra.run()
[12]:
<networkit.centrality.KadabraBetweenness at 0x7fc864bd31b0>
[13]:
#The 10 most central nodes according to betweenness are then
kadabra.ranking()[:10]
[13]:
[(1143, 0.2609571788413098),
(6655, 0.2005037783375315),
(6555, 0.19848866498740556),
(7297, 0.18841309823677582),
(6932, 0.1853904282115869),
(6744, 0.13702770780856424),
(6098, 0.13198992443324936),
(3156, 0.12896725440806045),
(2258, 0.11687657430730479),
(7324, 0.09874055415617128)]
Closeness centrality indicates how close a node is to all other nodes in the network. It is calculated as inverse of the average of the shortest path length from the node to every other node in the network.
The NetworKit Closeness class computes the exact closeness centrality of all the nodes of a graph. The constructor Closeness(G, normalized=True, variant=networkit.centrality.ClosenessVariant.STANDARD) expects a mandatory networkit.Graph, and two optional parameters,
normalized
and variant
. The centrality values can be optionally normalized for unweighted graphs by setting normalized
to True; by default the centrality values are normalized. The parametervariant
dictates the closeness variant to be used when computing the closeness. Set variant
to networkit.centrality.ClosenessVariant.STANDARD
to use the standard definition of closeness which is defined for connected graphs only, or to
networkit.centrality.ClosenessVariant.GENERALIZED
to use the generalized definition of closeness that is also defined for non-connected graphs.
As the graph we will use is weighted the values should not be normalized. So, to compute the closeness of a graph using the generalised closeness variant because our graph is disconnected we can do the following:
[14]:
# Read a graph
G = nk.readGraph("../input/foodweb-baydry.konect", nk.Format.KONECT)
[15]:
# Initialize algorithm
close = nk.centrality.Closeness(G, False, nk.centrality.ClosenessVariant.GENERALIZED)
close.run()
[15]:
<networkit.centrality.Closeness at 0x7fc8656261d0>
[16]:
#The 10 most central nodes according to betweenness are then
close.ranking() [:10]
[16]:
[(8, 0.962150639989747),
(64, 0.8484943683258478),
(57, 0.7951252523231831),
(55, 0.7405035792871096),
(65, 0.48508765673100607),
(66, 0.469459105637361),
(58, 0.46525538614449846),
(82, 0.46179282298925417),
(85, 0.46142579646975274),
(83, 0.46137631059745826)]
Since exact calculation of closeness scores is often out of reach, NetworKit provides an approximation closeness centrality according to the algorithm described in ‘Cohen et al., Computing Classic Closeness Centrality, at Scale
.
The constructor expects a mandatory connected networkit.Graph among other optional values: ApproxCloseness(G, nSamples, epsilon=0.1, normalized=False, type=OUTBOUND). nSamples
dictates the number of samples that should be used when computing the approximate closeness;
this value depends on the size of graph and the more samples that are used, the better the results are likely to be. epsilon
can be any value between 0 and \(\infty\)-1 and is used for the error guarantee. The centrality values can be optionally normalized by setting normalized
to True; by default the centrality values are not normalized. If G is undirected, type
can be ignored. Otherwise use 0
for inbound centrality, 1
for outbound centrality and 2
for the sum of
both.
[17]:
# Read a graph
G = nk.readGraph("../input/foodweb-baydry.konect", nk.Format.KONECT)
[18]:
ac = nk.centrality.ApproxCloseness(G, 100)
ac.run()
[18]:
<networkit.centrality.ApproxCloseness at 0x7fc864ad5840>
[19]:
# 10 most central nodes according to closeness are
ac.ranking()[:10]
[19]:
[(8, 1.069286691490214),
(64, 0.9877850936410332),
(57, 0.9279775954126851),
(55, 0.8560137741548538),
(65, 0.5639929408370854),
(66, 0.5464887035548639),
(58, 0.54131354674079),
(80, 0.5329391425619848),
(82, 0.5326317456825165),
(85, 0.5322358335237358)]
Harmonic centrality is a variant of closeness centrality. It deals with the problem of unconnected graphs. The constructor, HarmonicCloseness(G, normalized=True), expects a networkit.Graph and an optional parameter normalized
that is set to true by default. If
normalized
is set to true, centrality scores are normalized to into an interval between 0 and 1.
Computing the harmonic closeness of a graph, and then normalizing the scores can be done as follows:
[20]:
# Read a graph
G = nk.readGraph("../input/foodweb-baydry.konect", nk.Format.KONECT)
[21]:
harmonic = nk.centrality.HarmonicCloseness(G, True)
harmonic.run()
[21]:
<networkit.centrality.HarmonicCloseness at 0x7fc864ad6470>
[22]:
# 10 most central nodes according to closeness are
harmonic.ranking()[:10]
[22]:
[(54, 563818.6552013048),
(90, 334190.4744932203),
(86, 321218.72482692736),
(15, 296940.9093191127),
(16, 254101.51104106667),
(21, 226571.16590482905),
(51, 218140.5689005639),
(22, 211206.7510456232),
(78, 184407.6584481432),
(97, 173715.2379835851)]
The TopHarmonicCloseness(G, k=1, useBFSbound=True) algorithm finds the exact top k nodes with highest harmonic closeness centrality. It is faster than computing the harmonic closeness for all nodes. The implementation is based on Computing Top-k Centrality Faster in Unweighted Graphs, Bergamini et al., ALENEX16. The parameter k
expresses the number of nodes with the highest centrality scores that the algorithm must find. The algorithms are based on two heuristics. We recommend to use useBFSbound = False
for complex networks (or networks with small diameter) and useBFSbound = True
for street networks (or networks with large diameters).
[23]:
# Read a graph
G = nk.readGraph("../input/foodweb-baydry.konect", nk.Format.KONECT)
[24]:
# Initialize algorithm
topHarmClose = nk.centrality.TopHarmonicCloseness(G, 10, useNBbound=False)
topHarmClose.run()
[24]:
<networkit.centrality.TopHarmonicCloseness at 0x7fc864ad6b60>
Unlike all the other algorithms we have seen before, TopHarmonicCloseness
does not have a ranking
method which stores a sorted list of all centrality scores. Instead, we have the topkNodesList(includeTrail=False) method. By calling this method, we can print the the top k
central nodes. If includeTrail
is true the closeness centrality of
some nodes below the top-k could be equal to the k-th closeness will also be printed. Note, that the resulting vector may be longet than k
.
[25]:
# k most central nodes according to closeness are
topHarmClose.topkNodesList()
[25]:
[54, 90, 86, 15, 16, 21, 51, 22, 78, 97]
Alternatively, the topKScores(includeTrail=False) method may be used.
[26]:
topHarmClose.topkScoresList()
[26]:
[71604969.2105657,
42442190.26063897,
40794778.05301977,
37711495.4835273,
32270891.902215466,
28774538.069913305,
27703852.250371616,
26823257.38279415,
23419772.622914188,
22061835.223915298]
The TopCloseness(G, k=1, first_heu=True, sec_heu=True) algorithm finds the exact top k nodes with highest harmonic closeness centrality. It is faster than computing the harmonic closeness for all nodes. The implementation is based on Computing Top-k Centrality Faster in Unweighted Graphs, Bergamini et al., ALENEX16. The algorithm is
based on two independent heuristics described in the above paper. The parameter k
expresses the number of nodes with the highest centrality scores that have to be found by the algorithm. If first_heu
is true, the neighborhood-based lower bound is computed and nodes are sorted according to it. If false, nodes are simply sorted by degree. If sec_heu
is true, the BFSbound is re-computed at each iteration. If false, BFScut is used.
[27]:
# Read a graph
G = nk.readGraph("../input/foodweb-baydry.konect", nk.Format.KONECT)
[28]:
# Initialize algorithm
topClose = nk.centrality.TopCloseness(G, 10, first_heu=True)
topClose.run()
[28]:
<networkit.centrality.TopCloseness at 0x7fc864ad6980>
[29]:
# k most central nodes according to closeness are
topClose.topkNodesList()
[29]:
[84, 83, 14, 20, 71, 87, 2, 80, 72, 81]
Degree is a centrality measure that counts how many neighbors a node has.
Degree centrality is a centrality measure that counts how many neighbors a node has. Degree centrality can be computed in NetworKit using the DegreeCentrality(G, normalized=False, outDeg=True, ignoreSelfLoops=True) class. It expects a networkit.Graph. Set
normalized
to True if the centrality scored should be normalized to values between 0 and 1. If the network is directed, we have two versions of the measure: in-degree is the number of incoming links; out-degree is the number of outgoing links. By default, the out-degree is used.
For an undirected graph, using the default parameters, the degree centrality can be computed as follows:
[30]:
# Read graph
G = nk.readGraph("../input/wiki-Vote.txt", nk.Format.SNAP)
[31]:
# Initialize algorithm
deg = nk.centrality.DegreeCentrality(G)
deg.run()
[31]:
<networkit.centrality.DegreeCentrality at 0x7fc864ad7490>
[32]:
# 10 most central nodes according to degree centrality are
deg.ranking()[:10]
[32]:
[(699, 1065.0),
(2374, 773.0),
(408, 743.0),
(286, 740.0),
(2013, 732.0),
(1052, 688.0),
(3714, 618.0),
(517, 533.0),
(483, 517.0),
(1221, 495.0)]
Eigenvector centrality measures the influence of a node in a network by assinging relative scores to each node in a network.
Eigenvector centrality and its variant PageRank assign relative importance to nodes according to their connections, incorporating the idea that edges to high-scoring nodes contribute more. PageRank is a version of eigenvector centrality which introduces a damping factor, modeling a random web surfer which at some point stops following links and jumps to a random page. In PageRank theory, centrality is understood as the probability of such a web surfer to arrive on a certain page. Our implementation of both measures is based on parallel power iteration, a relatively simple eigensolver.
We demonstrate it here on the small Karate club graph.
[33]:
# Read graph
K = nk.readGraph("../input/karate.graph", nk.Format.METIS)
[34]:
# Eigenvector centrality
ec = nk.centrality.EigenvectorCentrality(K)
ec.run()
ec.ranking()[:10] # the 10 most central nodes
[34]:
[(33, 0.37335860763538437),
(0, 0.35548796275763045),
(2, 0.31719212126079693),
(32, 0.30864169996172613),
(1, 0.26595844854862444),
(8, 0.2274061452435449),
(13, 0.22647475684342064),
(3, 0.2111796960531623),
(31, 0.1910365857249303),
(30, 0.1747599501690216)]
[35]:
# PageRank
pr = nk.centrality.PageRank(K, damp=0.85, tol=1e-9)
pr.run()
pr.ranking()[:10] # the 10 most central nodes
[35]:
[(33, 0.10091918107710363),
(0, 0.09699728673391753),
(32, 0.07169322502169574),
(2, 0.05707850945632192),
(1, 0.05287692444772233),
(31, 0.03715808671239322),
(3, 0.035859858105808544),
(23, 0.031522514279355436),
(8, 0.02976605594632442),
(13, 0.029536456269206687)]
By default, PageRank uses the L2 norm as convergence criterion. Alternatively, one can also use the L1 norm. Additionally, one can also set the maximum amount of iterations.
[36]:
# PageRank using L1 norm, and a 100 maximum iterations
pr = nk.centrality.PageRank(K, damp=0.85, tol=1e-9)
pr.norm = nk.centrality.Norm.L1_NORM
pr.maxIterations = 100
pr.run()
pr.ranking()[:10] # the 10 most central nodes
[36]:
[(33, 0.10091918205705729),
(0, 0.09699728568137082),
(32, 0.07169322578984548),
(2, 0.057078509481399156),
(1, 0.05287692414515951),
(31, 0.03715808699163245),
(3, 0.03585985785613518),
(23, 0.03152251466832272),
(8, 0.029766056051855488),
(13, 0.029536456177810026)]
Katz centrality computes the relative influence of a node within a network by measuring the number of the immediate neighbors, and also all other nodes in the network that connect to the node through these immediate neighbors. Connections made with distant neighbors are, however, penalized by an attenuation factor \(\alpha\). Each path or connection between a pair of nodes is assigned a weight determined by \(\alpha\) and the distance between nodes as \(\alpha ^d\).
The KatzCentrality(G, alpha=5e-4, beta=0.1, tol=1e-8) constructor expects a networkit.Graph as a mandatory parameter. The parameter alpha
is the damping of the matrix vector result, while beta
is a constant value added to the centrality of each vertex. The parameter
tol
dictates the tolerance for convergence. Computing the Katz centrality in NetworKit can be like below.
[37]:
G = nk.readGraph("../input/karate.graph", nk.Format.METIS)
[38]:
# Initialize algorithm
katz = nk.centrality.KatzCentrality(G, 1e-3)
katz.run()
[38]:
<networkit.centrality.KatzCentrality at 0x7fc864ad74c0>
[39]:
# 10 most central nodes
katz.ranking()[:10]
[39]:
[(33, 0.19367744613991514),
(0, 0.1918907986471833),
(32, 0.18470136309691013),
(2, 0.18112282552442752),
(1, 0.1793038647977577),
(31, 0.17392597722904418),
(3, 0.17391172244381983),
(8, 0.1721413052741721),
(13, 0.1721395007428994),
(23, 0.17210705607838117)]
The spanning centrality of an edge e
in an undirected graph is the fraction of the spanning trees of the graph that contain the edge e
. The SpanningEdgeCentrality(const Graph& G, double tol = 0.1) constructor expects a networkit.Graph as a mandatory
parameter. The tol
parameter determines the tolerance used for approximation: with probability at least 1-1/n, the approximated scores are within a factor 1+tol
from the exact scores.
Computing the SpanningEdge Centrality in NetworKit is then demonstrated below.
[40]:
# Read graph
G = nk.readGraph("../input/karate.graph", nk.Format.METIS)
[41]:
# Initialize algorithm
G.indexEdges()
span = nk.centrality.SpanningEdgeCentrality(G, 1e-6)
span.run()
[41]:
<networkit.centrality.SpanningEdgeCentrality at 0x7fc864ad62c0>
Using the scores() method we can get a vector containing the SEC score for each edge in the graph.
[42]:
span.scores()[:10]
[42]:
[0.19306324991625834,
0.20762324500950174,
0.2315296370329742,
0.25009766179369525,
0.277246201614483,
0.28002802664810306,
0.46491028639734855,
0.43859344555259383,
0.4385942727910844,
0.5175438525005359]
An algorithm from Hayashi et al. based on sampling of spanning trees uniformly at random approximates the spanning edge centrality of each edge of a graph up to a maximum absolute error. ApproxSpanningEdge(G, eps=0.1)
expects as input an undirected graph, and the maximum absolute error eps
. The algorithm requires the edges of the graph to be indexed.
[43]:
# Read a graph
G = nk.graphtools.toUndirected(nk.readGraph("../input/foodweb-baydry.konect", nk.Format.KONECT))
G.indexEdges()
apx_span = nk.centrality.ApproxSpanningEdge(G, 0.05)
_=apx_span.run()
[44]:
apx_span.scores()[:10]
[44]:
[0.11642017444065225,
0.10087220326128175,
0.13993174061433447,
0.12021236253318164,
0.10580204778156997,
0.11376564277588168,
0.12817595752749336,
0.1149032992036405,
0.11338642396662875,
0.11452408039438756]
The local clustering coefficient of a vertex in a graph quantifies how close its neighbours are to being a clique (complete graph).
The LocalClusteringCoefficient(G, turbo=False) constructor expects a networkit.Graph and a Boolean parameter turbo
which is by default initialized to false. turbo
activates a (sequential, but fast) pre-processing step using ideas from this
paper. This reduces the running time significantly for most graphs. However, the turbo mode needs O(m) additional memory. The turbo mode is particularly effective for graphs with nodes of very high degree and a very skewed degree distribution.
We demonstrate the use using the same graph as before,
[45]:
# Read graph
G = nk.readGraph("../input/karate.graph", nk.Format.METIS)
[46]:
# Initialize algorithm
lcc = nk.centrality.LocalClusteringCoefficient(G)
lcc.run()
[46]:
<networkit.centrality.LocalClusteringCoefficient at 0x7fc864ad7220>
[47]:
lcc.ranking()[:10]
[47]:
[(7, 1.0),
(12, 1.0),
(14, 1.0),
(15, 1.0),
(16, 1.0),
(17, 1.0),
(18, 1.0),
(20, 1.0),
(21, 1.0),
(22, 1.0)]