Centrality Tutorial

This notebook covers some of the centrality algorithms implemented in NetworKit. Centrality measures the relative importance of a node within a graph. Code for centrality analysis in NetworKit is grouped into the centrality module.

As always, the first step is to start by importing NetworKit.

[1]:
import networkit as nk

Every algorithm within the centrality module in NetworKit has a run() function that must be called in order for the algorithm to be executed after initialization.

Betweeness Centrality

Betweenness centrality measures the extent to which a vertex lies on shortest paths between other vertices. Vertices with high betweenness may have considerable influence within a network.

Betweenness

The fastest algorithm for the exact computation of the betweenness centrality is Brandes’ algorithm, and it is implemented in the Betweenness class. The constructor Betweenness(G, normalized=False, computeEdgeCentrality=False) expects a mandatory networkit.Graph, and two optional parameters normalized and computeEdgeCentrality. If normalized is set to true the centrality scores will be normalized. The parameter computeEdgeCentrality should be set to true if edge betweeness should be computed as well.

We start by reading a graph, and then inititalising the algorithm with the parameters we want. Assuming we shall use the default parameters, i.e., centrality scores should not be normalized and edge betweeness should not be computed, we only need to pass a graph to the method.

[2]:
# Read graph
G = nk.readGraph("../input/power.graph", nk.Format.METIS)
# Initalize algorithm
btwn = nk.centrality.Betweenness(G)
[3]:
# Run
btwn.run()
[3]:
<networkit.centrality.Betweenness at 0x7f6608357ac0>
[4]:
# The 10 most central nodes according to betweenness are then
btwn.ranking()[:10]
[4]:
[(4164, 7036954.687164465),
 (2543, 6873056.733431251),
 (1243, 6824187.837966225),
 (4219, 6774285.210945553),
 (2528, 6521871.002117601),
 (1267, 6057480.123171416),
 (1308, 5770690.329252719),
 (1244, 5007410.4328002585),
 (426, 5000602.04832292),
 (2606, 4955764.65448363)]

ApproxBetweenness

Since exact calculation of betweenness scores is often out of reach, NetworKit provides an approximation algorithm based on path sampling. This functionality is provided by the ApproxBetweenness class, (G, epsilon=0.01, delta=0.1, universalConstant=1.0). It expects a mandatory undirected networkit.Graph. Here we estimate betweenness centrality in PGPgiantcompo, with a guarantee that the error is no larger than an additive constant epsilon with a probability of at least 1 - delta. The universalConstant is used to compute the sample size.

[5]:
# Read graph
G = nk.readGraph("../input/PGPgiantcompo.graph", nk.Format.METIS)
[6]:
ab = nk.centrality.ApproxBetweenness(G, epsilon=0.1)
ab.run()
[6]:
<networkit.centrality.ApproxBetweenness at 0x7f66079b2770>
[7]:
# The 10 most central nodes according to betweenness are then
ab.ranking()[:10]
[7]:
[(1143, 0.13237063778580002),
 (6655, 0.09386281588447641),
 (7297, 0.09025270758122733),
 (6555, 0.08303249097472917),
 (6932, 0.07581227436823101),
 (3156, 0.0673886883273165),
 (7369, 0.057761732851985596),
 (5165, 0.05655836341756923),
 (6744, 0.05535499398315286),
 (7155, 0.05535499398315286)]

EstimateBetweenness

The EstimateBetweenness class estimates the betweenness centrality according to the algorthm described in Sanders, Geisberger, Schultes: Better Approximation of Betweenness Centrality. Despite the algorithm performing well in practice, no guarantee can be given. If a theoritical guarantee is required, use the ApproxBetweenness algorithm.

The constructor EstimateBetweenness(Graph G, count nSamples, bool normalized=False, bool parallel_flag=False) expects a networkit.Graph and the number of samples as mandatory parameters. Further, the centrality values can be optionally normalized by setting normalized to True; by default the centrality values are not normalized. This algorithm can be run in parallel by setting the parallel_flag to true. Running in parallel, however, comes with an additional cost in memory of z + 3z + t.

[8]:
# Read a graph
G = nk.readGraph("../input/PGPgiantcompo.graph", nk.Format.METIS)
[9]:
# Initialize algorithm
est = nk.centrality.EstimateBetweenness(G, 50, True, False)
est.run()
[9]:
<networkit.centrality.EstimateBetweenness at 0x7f66445f0a00>
[10]:
#The 10 most central nodes according to betweenness are then
est.ranking()[:10]
[10]:
[(1143, 0.14248173631752845),
 (6555, 0.1032657169519441),
 (6655, 0.08321720573310608),
 (7297, 0.07418833829542044),
 (6744, 0.0673798621157345),
 (6932, 0.06725360154876636),
 (2258, 0.06498091003198506),
 (6098, 0.05871431554615612),
 (5165, 0.057308324769363955),
 (7369, 0.05366326518913173)]

KadabraBetweenness

KadabraBetweenness is an ADaptive Algorithm for Betweennes via Random Approximation presented by Borassi M., and Natale E.. NetworKit provides variants of the algorithm that either reduce this memory consumption to O(1) or ensure that deterministic results are obtained. For more details about the implementation, see this paper by Van der Grinten A., Angriman E., and Meyerhenke H. (2019).

The constructor KadabraBetweennes(G, err=0.01, delta=0.1, deterministic=False, k=0, unionSample=0, startFactor=100) only expects a networkit.Graph as a mandatory parameter. err is the maximum additive error guaranteeded when approximating the betweenness centrality of all nodes, while delta is the probability that the values of the betweenness centrality are within the error guarantee. The parameter k expresses the number of top k-nodes to be computed; if set to zero the approximate betweenness centrality of all nodes will be computed. unionSample and startFactor are algorithm parameters that are automatically chosen.

Hence, computing the KadabraBetweenness for all nodes with a maximum additive error of 0.05, and a probabilty of 0.8 that the values of the betweenness centrality are within that range can be done as follows:

[11]:
# Read a graph
G = nk.readGraph("../input/PGPgiantcompo.graph", nk.Format.METIS)
[12]:
# Initialize algorithm
kadabra = nk.centrality.KadabraBetweenness(G, 0.05, 0.8)
kadabra.run()
[12]:
<networkit.centrality.KadabraBetweenness at 0x7f6607adb150>
[13]:
#The 10 most central nodes according to betweenness are then
kadabra.ranking()[:10]
[13]:
[(1143, 0.2508816120906801),
 (6655, 0.1894206549118388),
 (6555, 0.18740554156171285),
 (7297, 0.16826196473551638),
 (6932, 0.16523929471032744),
 (6744, 0.15113350125944586),
 (6098, 0.13904282115869018),
 (3156, 0.13198992443324936),
 (2258, 0.11486146095717883),
 (5165, 0.10780856423173804)]

Closeness Centrality

Closeness centrality indicates how close a node is to all other nodes in the network. It is calculated as inverse of the average of the shortest path length from the node to every other node in the network.

Closeness

The NetworKit Closeness class computes the exact closeness centrality of all the nodes of a graph. The constructor Closeness(G, normalized=True, variant=networkit.centrality.ClosenessVariant.STANDARD) expects a mandatory networkit.Graph, and two optional parameters, normalized and variant. The centrality values can be optionally normalized for unweighted graphs by setting normalized to True; by default the centrality values are normalized. The parametervariant dictates the closeness variant to be used when computing the closeness. Set variant to networkit.centrality.ClosenessVariant.STANDARD to use the standard definition of closeness which is defined for connected graphs only, or to networkit.centrality.ClosenessVariant.GENERALIZED to use the generalized definition of closeness that is also defined for non-connected graphs.

As the graph we will use is weighted the values should not be normalized. So, to compute the closeness of a graph using the generalised closeness variant because our graph is disconnected we can do the following:

[14]:
# Read a graph
G = nk.readGraph("../input/foodweb-baydry.konect", nk.Format.KONECT)
[15]:
# Initialize algorithm
close = nk.centrality.Closeness(G, False, nk.centrality.ClosenessVariant.GENERALIZED)
close.run()
[15]:
<networkit.centrality.Closeness at 0x7f66079b3400>
[16]:
#The 10 most central nodes according to betweenness are then
close.ranking() [:10]
[16]:
[(8, 0.962150639989747),
 (64, 0.8484943683258478),
 (57, 0.7951252523231831),
 (55, 0.7405035792871096),
 (65, 0.48508765673100607),
 (66, 0.469459105637361),
 (58, 0.46525538614449846),
 (82, 0.46179282298925417),
 (85, 0.46142579646975274),
 (83, 0.46137631059745826)]

ApproxCloseness

Since exact calculation of closeness scores is often out of reach, NetworKit provides an approximation closeness centrality according to the algorithm described in ‘Cohen et al., Computing Classic Closeness Centrality, at Scale.

The constructor expects a mandatory connected networkit.Graph among other optional values: ApproxCloseness(G, nSamples, epsilon=0.1, normalized=False, type=OUTBOUND). nSamples dictates the number of samples that should be used when computing the approximate closeness; this value depends on the size of graph and the more samples that are used, the better the results are likely to be. epsilon can be any value between 0 and \(\infty\)-1 and is used for the error guarantee. The centrality values can be optionally normalized by setting normalized to True; by default the centrality values are not normalized. If G is undirected, type can be ignored. Otherwise use 0 for inbound centrality, 1 for outbound centrality and 2 for the sum of both.

[17]:
# Read a graph
G = nk.readGraph("../input/foodweb-baydry.konect", nk.Format.KONECT)
[18]:
ac = nk.centrality.ApproxCloseness(G, 100)
ac.run()
[18]:
<networkit.centrality.ApproxCloseness at 0x7f66079b2680>
[19]:
# 10 most central nodes according to closeness are
ac.ranking()[:10]
[19]:
[(8, 1.069693329861525),
 (64, 0.9967222347929672),
 (57, 0.9362751255220639),
 (55, 0.8560715342899561),
 (65, 0.5696782304524416),
 (66, 0.5518282176999724),
 (58, 0.5465291916742194),
 (75, 0.535175749910808),
 (62, 0.5345214029424143),
 (63, 0.5342288416962546)]

HarmonicCloseness

Harmonic centrality is a variant of closeness centrality. It deals with the problem of unconnected graphs. The constructor, HarmonicCloseness(G, normalized=True), expects a networkit.Graph and an optional parameter normalized that is set to true by default. If normalized is set to true, centrality scores are normalized to into an interval between 0 and 1.

Computing the harmonic closeness of a graph, and then normalizing the scores can be done as follows:

[20]:
# Read a graph
G = nk.readGraph("../input/foodweb-baydry.konect", nk.Format.KONECT)
[21]:
harmonic = nk.centrality.HarmonicCloseness(G, True)
harmonic.run()
[21]:
<networkit.centrality.HarmonicCloseness at 0x7f66079b3d30>
[22]:
# 10 most central nodes according to closeness are
harmonic.ranking()[:10]
[22]:
[(54, 563818.6552013048),
 (90, 334190.4744932203),
 (86, 321218.72482692736),
 (15, 296940.9093191127),
 (16, 254101.51104106667),
 (21, 226571.16590482905),
 (51, 218140.5689005639),
 (22, 211206.7510456232),
 (78, 184407.6584481432),
 (97, 173715.2379835851)]

TopHarmonicCloseness

The TopHarmonicCloseness(G, k=1, useBFSbound=True) algorithm finds the exact top k nodes with highest harmonic closeness centrality. It is faster than computing the harmonic closeness for all nodes. The implementation is based on Computing Top-k Centrality Faster in Unweighted Graphs, Bergamini et al., ALENEX16. The parameter k expresses the number of nodes with the highest centrality scores that the algorithm must find. The algorithms are based on two heuristics. We recommend to use useBFSbound = False for complex networks (or networks with small diameter) and useBFSbound = True for street networks (or networks with large diameters).

[23]:
# Read a graph
G = nk.readGraph("../input/foodweb-baydry.konect", nk.Format.KONECT)
[24]:
# Initialize algorithm
topHarmClose = nk.centrality.TopHarmonicCloseness(G, 10, useNBbound=False)
topHarmClose.run()
[24]:
<networkit.centrality.TopHarmonicCloseness at 0x7f66079b3c40>

Unlike all the other algorithms we have seen before, TopHarmonicCloseness does not have a ranking method which stores a sorted list of all centrality scores. Instead, we have the topkNodesList(includeTrail=False) method. By calling this method, we can print the the top k central nodes. If includeTrail is true the closeness centrality of some nodes below the top-k could be equal to the k-th closeness will also be printed. Note, that the resulting vector may be longet than k.

[25]:
# k most central nodes according to closeness are
topHarmClose.topkNodesList()
[25]:
[54, 90, 86, 15, 16, 21, 51, 22, 78, 97]

Alternatively, the topKScores(includeTrail=False) method may be used.

[26]:
topHarmClose.topkScoresList()
[26]:
[71604969.2105657,
 42442190.26063897,
 40794778.05301977,
 37711495.4835273,
 32270891.902215466,
 28774538.069913305,
 27703852.250371616,
 26823257.38279415,
 23419772.622914188,
 22061835.223915298]

TopCloseness

The TopCloseness(G, k=1, first_heu=True, sec_heu=True) algorithm finds the exact top k nodes with highest harmonic closeness centrality. It is faster than computing the harmonic closeness for all nodes. The implementation is based on Computing Top-k Centrality Faster in Unweighted Graphs, Bergamini et al., ALENEX16. The algorithm is based on two independent heuristics described in the above paper. The parameter k expresses the number of nodes with the highest centrality scores that have to be found by the algorithm. If first_heu is true, the neighborhood-based lower bound is computed and nodes are sorted according to it. If false, nodes are simply sorted by degree. If sec_heu is true, the BFSbound is re-computed at each iteration. If false, BFScut is used.

[27]:
# Read a graph
G = nk.readGraph("../input/foodweb-baydry.konect", nk.Format.KONECT)
[28]:
# Initialize algorithm
topClose = nk.centrality.TopCloseness(G, 10, first_heu=True)
topClose.run()
[28]:
<networkit.centrality.TopCloseness at 0x7f6607978a30>
[29]:
# k most central nodes according to closeness are
topClose.topkNodesList()
[29]:
[84, 83, 14, 20, 71, 87, 2, 80, 72, 81]

Degree Centrality

Degree is a centrality measure that counts how many neighbors a node has.

Degree Centrality

Degree centrality is a centrality measure that counts how many neighbors a node has. Degree centrality can be computed in NetworKit using the DegreeCentrality(G, normalized=False, outDeg=True, ignoreSelfLoops=True) class. It expects a networkit.Graph. Set normalized to True if the centrality scored should be normalized to values between 0 and 1. If the network is directed, we have two versions of the measure: in-degree is the number of incoming links; out-degree is the number of outgoing links. By default, the out-degree is used.

For an undirected graph, using the default parameters, the degree centrality can be computed as follows:

[30]:
# Read graph
G = nk.readGraph("../input/wiki-Vote.txt", nk.Format.SNAP)
[31]:
# Initialize algorithm
deg = nk.centrality.DegreeCentrality(G)
deg.run()
[31]:
<networkit.centrality.DegreeCentrality at 0x7f66079d44c0>
[32]:
# 10 most central nodes according to degree centrality are
deg.ranking()[:10]
[32]:
[(699, 1065.0),
 (2374, 773.0),
 (408, 743.0),
 (286, 740.0),
 (2013, 732.0),
 (1052, 688.0),
 (3714, 618.0),
 (517, 533.0),
 (483, 517.0),
 (1221, 495.0)]

Eigenvector Centrality and PageRank

Eigenvector centrality measures the influence of a node in a network by assinging relative scores to each node in a network.

Eigenvector Centrality and PageRank

Eigenvector centrality and its variant PageRank assign relative importance to nodes according to their connections, incorporating the idea that edges to high-scoring nodes contribute more. PageRank is a version of eigenvector centrality which introduces a damping factor, modeling a random web surfer which at some point stops following links and jumps to a random page. In PageRank theory, centrality is understood as the probability of such a web surfer to arrive on a certain page. Our implementation of both measures is based on parallel power iteration, a relatively simple eigensolver.

We demonstrate it here on the small Karate club graph.

[33]:
# Read graph
K = nk.readGraph("../input/karate.graph", nk.Format.METIS)
[34]:
# Eigenvector centrality
ec = nk.centrality.EigenvectorCentrality(K)
ec.run()
ec.ranking()[:10] # the 10 most central nodes
[34]:
[(33, 0.37335860763538437),
 (0, 0.35548796275763045),
 (2, 0.31719212126079693),
 (32, 0.30864169996172613),
 (1, 0.26595844854862444),
 (8, 0.2274061452435449),
 (13, 0.22647475684342064),
 (3, 0.2111796960531623),
 (31, 0.1910365857249303),
 (30, 0.1747599501690216)]
[35]:
# PageRank
pr = nk.centrality.PageRank(K, damp=0.85, tol=1e-9)
pr.run()
pr.ranking()[:10] # the 10 most central nodes
[35]:
[(33, 0.10091918107710363),
 (0, 0.09699728673391753),
 (32, 0.07169322502169574),
 (2, 0.05707850945632192),
 (1, 0.05287692444772233),
 (31, 0.03715808671239322),
 (3, 0.035859858105808544),
 (23, 0.031522514279355436),
 (8, 0.02976605594632442),
 (13, 0.029536456269206687)]

By default, PageRank uses the L2 norm as convergence criterion. Alternatively, one can also use the L1 norm. Additionally, one can also set the maximum amount of iterations.

[36]:
# PageRank using L1 norm, and a 100 maximum iterations
pr = nk.centrality.PageRank(K, damp=0.85, tol=1e-9)
pr.norm = nk.centrality.Norm.L1_NORM
pr.maxIterations = 100
pr.run()
pr.ranking()[:10] # the 10 most central nodes
[36]:
[(33, 0.10091918205705729),
 (0, 0.09699728568137082),
 (32, 0.07169322578984548),
 (2, 0.057078509481399156),
 (1, 0.05287692414515951),
 (31, 0.03715808699163245),
 (3, 0.03585985785613518),
 (23, 0.03152251466832272),
 (8, 0.029766056051855488),
 (13, 0.029536456177810026)]

Katz Centrality

Katz centrality computes the relative influence of a node within a network by measuring the number of the immediate neighbors, and also all other nodes in the network that connect to the node through these immediate neighbors. Connections made with distant neighbors are, however, penalized by an attenuation factor \(\alpha\). Each path or connection between a pair of nodes is assigned a weight determined by \(\alpha\) and the distance between nodes as \(\alpha ^d\).

The KatzCentrality(G, alpha=5e-4, beta=0.1, tol=1e-8) constructor expects a networkit.Graph as a mandatory parameter. The parameter alpha is the damping of the matrix vector result, while beta is a constant value added to the centrality of each vertex. The parameter tol dictates the tolerance for convergence. Computing the Katz centrality in NetworKit can be like below.

[37]:
G = nk.readGraph("../input/karate.graph", nk.Format.METIS)
[38]:
# Initialize algorithm
katz = nk.centrality.KatzCentrality(G, 1e-3)
katz.run()
[38]:
<networkit.centrality.KatzCentrality at 0x7f66079d4e50>
[39]:
# 10 most central nodes
katz.ranking()[:10]
[39]:
[(33, 0.19367744613991514),
 (0, 0.1918907986471833),
 (32, 0.18470136309691013),
 (2, 0.18112282552442752),
 (1, 0.1793038647977577),
 (31, 0.17392597722904418),
 (3, 0.17391172244381983),
 (8, 0.1721413052741721),
 (13, 0.1721395007428994),
 (23, 0.17210705607838117)]

Others

Spanning Edge Centrality

The spanning centrality of an edge e in an undirected graph is the fraction of the spanning trees of the graph that contain the edge e. The SpanningEdgeCentrality(const Graph& G, double tol = 0.1) constructor expects a networkit.Graph as a mandatory parameter. The tol parameter determines the tolerance used for approximation: with probability at least 1-1/n, the approximated scores are within a factor 1+tol from the exact scores.

Computing the SpanningEdge Centrality in NetworKit is then demonstrated below.

[40]:
# Read graph
G = nk.readGraph("../input/karate.graph", nk.Format.METIS)
[41]:
# Initialize algorithm
G.indexEdges()
span = nk.centrality.SpanningEdgeCentrality(G, 1e-6)
span.run()
[41]:
<networkit.centrality.SpanningEdgeCentrality at 0x7f66079b3ac0>

Using the scores() method we can get a vector containing the SEC score for each edge in the graph.

[42]:
span.scores()[:10]
[42]:
[0.19306324991625834,
 0.20762324500950174,
 0.2315296370329742,
 0.25009766179369525,
 0.277246201614483,
 0.28002802664810306,
 0.46491028639734855,
 0.43859344555259383,
 0.4385942727910844,
 0.5175438525005359]

Approximate Spanning Edge Centrality

An algorithm from Hayashi et al. based on sampling of spanning trees uniformly at random approximates the spanning edge centrality of each edge of a graph up to a maximum absolute error. ApproxSpanningEdge(G, eps=0.1) expects as input an undirected graph, and the maximum absolute error eps. The algorithm requires the edges of the graph to be indexed.

[43]:
# Read a graph
G = nk.graphtools.toUndirected(nk.readGraph("../input/foodweb-baydry.konect", nk.Format.KONECT))
G.indexEdges()
apx_span = nk.centrality.ApproxSpanningEdge(G, 0.05)
_=apx_span.run()
[44]:
apx_span.scores()[:10]
[44]:
[0.12514220705346984,
 0.1031475161167994,
 0.14524080394387562,
 0.12931361395525218,
 0.11566173682214638,
 0.1149032992036405,
 0.12097080015168753,
 0.11376564277588168,
 0.10656048540007584,
 0.1130072051573758]

Local ClusteringCoefficient

The local clustering coefficient of a vertex in a graph quantifies how close its neighbours are to being a clique (complete graph).

The LocalClusteringCoefficient(G, turbo=False) constructor expects a networkit.Graph and a Boolean parameter turbo which is by default initialized to false. turbo activates a (sequential, but fast) pre-processing step using ideas from this paper. This reduces the running time significantly for most graphs. However, the turbo mode needs O(m) additional memory. The turbo mode is particularly effective for graphs with nodes of very high degree and a very skewed degree distribution.

We demonstrate the use using the same graph as before,

[45]:
# Read graph
G = nk.readGraph("../input/karate.graph", nk.Format.METIS)
[46]:
# Initialize algorithm
lcc = nk.centrality.LocalClusteringCoefficient(G)
lcc.run()
[46]:
<networkit.centrality.LocalClusteringCoefficient at 0x7f66079d47f0>
[47]:
lcc.ranking()[:10]
[47]:
[(7, 1.0),
 (12, 1.0),
 (14, 1.0),
 (15, 1.0),
 (16, 1.0),
 (17, 1.0),
 (18, 1.0),
 (20, 1.0),
 (21, 1.0),
 (22, 1.0)]