NetworKit User Guide

About NetworKit

NetworKit is an open-source toolkit for high-performance network analysis. Its aim is to provide tools for the analysis of large networks in the size range from thousands to billions of edges. For this purpose, it implements efficient graph algorithms, many of them parallel to utilize multicore architectures. These are meant to compute standard measures of network analysis, such as degree sequences, clustering coefficients and centrality. In this respect, NetworKit is comparable to packages such as NetworkX, albeit with a focus on parallelism and scalability. NetworKit is also a testbed for algorithm engineering and contains a few novel algorithms from recently published research, especially in the area of community detection.

Introduction

This notebook provides an interactive introduction to the features of NetworKit, consisting of text and executable code. We assume that you have read the Readme and successfully built the core library and the Python module. Code cells can be run one by one (e.g. by selecting the cell and pressing shift+enter), or all at once (via the Cell->Run All command). Try running all cells now to verify that NetworKit has been properly built and installed.

Preparation

This notebook creates some plots. To show them in the notebook, matplotlib must be imported and we need to activate matplotlib’s inline mode:

[1]:
%matplotlib inline
import matplotlib.pyplot as plt

NetworKit is a hybrid built from C++ and Python code: Its core functionality is implemented in C++ for performance reasons, and then wrapped for Python using the Cython toolchain. This allows us to expose high-performance parallel code as a normal Python module. On the surface, NetworKit is just that and can be imported accordingly:

[2]:
import networkit as nk

Reading and Writing Graphs

Let us start by reading a network from a file on disk: PGPgiantcompo.graph network. In the course of this tutorial, we are going to work on the PGPgiantcompo network, a social network/web of trust in which nodes are PGP keys and an edge represents a signature from one key on another. It is distributed with NetworKit as a good starting point.

There is a convenient function in the top namespace which tries to guess the input format and select the appropriate reader:

[3]:
G = nk.readGraph("../input/PGPgiantcompo.graph", nk.Format.METIS)

There is a large variety of formats for storing graph data in files. For NetworKit, the currently best supported format is the METIS adjacency format. Various example graphs in this format can be found here. The readGraph function tries to be an intelligent wrapper for various reader classes. In this example, it uses the METISGraphReader which is located in the graphio submodule, alongside other readers. These classes can also be used explicitly:

[4]:
G = nk.graphio.METISGraphReader().read("../input/PGPgiantcompo.graph")
# is the same as: readGraph("input/PGPgiantcompo.graph", Format.METIS)

It is also possible to specify the format for readGraph() and writeGraph(). Supported formats can be found via [graphio.]Format. However, graph formats are most likely only supported as far as the NetworKit::Graph can hold and use the data. Please note, that not all graph formats are supported for reading and writing.

Thus, it is possible to use NetworKit to convert graphs between formats. Let’s say I need the previously read PGP graph in the Graphviz format:

[5]:
import os

if not os.path.isdir('./output/'):
    os.makedirs('./output')
nk.graphio.writeGraph(G,"output/PGPgiantcompo.graphviz", nk.Format.GraphViz)

NetworKit also provides a function to convert graphs directly:

[6]:
nk.graphio.convertGraph(nk.Format.LFR, nk.Format.GML, "../input/example.edgelist", "output/example.gml")
converted ../input/example.edgelist to output/example.gml

For an overview about supported graph formats and how to properly use NetworKit to read/write and convert see the IO-tutorial notebook. For all input examples available in the NetworKit repository, there exists also a table showing the format and useable reader.

The Graph Object

Graph is the central class of NetworKit. An object of this type represents an undirected, optionally weighted network. Let us inspect several of the methods which the class provides.

[7]:
print(G.numberOfNodes(), G.numberOfEdges())
10680 24316

Nodes are simply integer indices, and edges are pairs of such indices.

[8]:
for u in G.iterNodes():
    if u > 5:
        print('...')
        break
    print(u)
0
1
2
3
4
5
...
[9]:
i = 0
for u, v in G.iterEdges():
    if i > 5:
        print('...')
        break
    print(u, v)
    i += 1
0 141
1 3876
1 7328
1 7317
1 5760
2 1929
...
[10]:
i = 0
for u, v, w in G.iterEdgesWeights():
    if i > 5:
        print('...')
        break
    print(u, v, w)
    i += 1
0 141 1.0
1 3876 1.0
1 7328 1.0
1 7317 1.0
1 5760 1.0
2 1929 1.0
...

This network is unweighted, meaning that each edge has the default weight of 1.

[11]:
G.weight(42, 11)
[11]:
1.0

Connected Components

A connected component is a set of nodes in which each pair of nodes is connected by a path. The following function determines the connected components of a graph:

[12]:
cc = nk.components.ConnectedComponents(G)
cc.run()
print("number of components ", cc.numberOfComponents())
v = 0
print("component of node ", v , ": " , cc.componentOfNode(0))
print("map of component sizes: ", cc.getComponentSizes())
number of components  1
component of node  0 :  0
map of component sizes:  {0: 10680}

Degree Distribution

Node degree, the number of edges connected to a node, is one of the most studied properties of networks. Types of networks are often characterized in terms of their distribution of node degrees. We obtain and visualize the degree distribution of our example network as follows.

[13]:
import numpy
dd = sorted(nk.centrality.DegreeCentrality(G).run().scores(), reverse=True)
degrees, numberOfNodes = numpy.unique(dd, return_counts=True)
plt.xscale("log")
plt.xlabel("degree")
plt.yscale("log")
plt.ylabel("number of nodes")
plt.plot(degrees, numberOfNodes)
plt.show()
../_images/notebooks_User-Guide_34_0.png

We choose a logarithmic scale on both axes because a powerlaw degree distribution, a characteristic feature of complex networks, would show up as a straight line from the top left to the bottom right on such a plot. As we see, the degree distribution of the PGPgiantcompo network is definitely skewed, with few high-degree nodes and many low-degree nodes. But does the distribution actually obey a power law? In order to study this, we need to apply the powerlaw module. Call the following function:

[14]:
try:
    import powerlaw
    fit = powerlaw.Fit(dd)
except ImportError:
    print ("Module powerlaw could not be loaded")
Calculating best minimal value for power law fit
xmin progress: 00%

xmin progress: 01% xmin progress: 02% xmin progress: 03% xmin progress: 04% xmin progress: 06% xmin progress: 07% xmin progress: 08% xmin progress: 09% xmin progress: 10% xmin progress: 12% xmin progress: 13% xmin progress: 14% xmin progress: 15% xmin progress: 17% xmin progress: 18% xmin progress: 19% xmin progress: 20% xmin progress: 21% xmin progress: 23% xmin progress: 24% xmin progress: 25% xmin progress: 26% xmin progress: 28% xmin progress: 29% xmin progress: 30% xmin progress: 31% xmin progress: 32% xmin progress: 34% xmin progress: 35% xmin progress: 36% xmin progress: 37% xmin progress: 39% xmin progress: 40% xmin progress: 41% xmin progress: 42% xmin progress: 43% xmin progress: 45% xmin progress: 46% xmin progress: 47% xmin progress: 48% xmin progress: 50% xmin progress: 51% xmin progress: 52% xmin progress: 53% xmin progress: 54% xmin progress: 56% xmin progress: 57% xmin progress: 58% xmin progress: 59% xmin progress: 60% xmin progress: 62% xmin progress: 63% xmin progress: 64% xmin progress: 65% xmin progress: 67% xmin progress: 68% xmin progress: 69% xmin progress: 70% xmin progress: 71% xmin progress: 73% xmin progress: 74% xmin progress: 75% xmin progress: 76% xmin progress: 78% xmin progress: 79% xmin progress: 80% xmin progress: 81% xmin progress: 82% xmin progress: 84% xmin progress: 85% xmin progress: 86% xmin progress: 87% xmin progress: 89% xmin progress: 90% xmin progress: 91% xmin progress: 92% xmin progress: 93% xmin progress: 95% xmin progress: 96% xmin progress: 97% xmin progress: 98%

</pre>

xmin progress: 01% xmin progress: 02% xmin progress: 03% xmin progress: 04% xmin progress: 06% xmin progress: 07% xmin progress: 08% xmin progress: 09% xmin progress: 10% xmin progress: 12% xmin progress: 13% xmin progress: 14% xmin progress: 15% xmin progress: 17% xmin progress: 18% xmin progress: 19% xmin progress: 20% xmin progress: 21% xmin progress: 23% xmin progress: 24% xmin progress: 25% xmin progress: 26% xmin progress: 28% xmin progress: 29% xmin progress: 30% xmin progress: 31% xmin progress: 32% xmin progress: 34% xmin progress: 35% xmin progress: 36% xmin progress: 37% xmin progress: 39% xmin progress: 40% xmin progress: 41% xmin progress: 42% xmin progress: 43% xmin progress: 45% xmin progress: 46% xmin progress: 47% xmin progress: 48% xmin progress: 50% xmin progress: 51% xmin progress: 52% xmin progress: 53% xmin progress: 54% xmin progress: 56% xmin progress: 57% xmin progress: 58% xmin progress: 59% xmin progress: 60% xmin progress: 62% xmin progress: 63% xmin progress: 64% xmin progress: 65% xmin progress: 67% xmin progress: 68% xmin progress: 69% xmin progress: 70% xmin progress: 71% xmin progress: 73% xmin progress: 74% xmin progress: 75% xmin progress: 76% xmin progress: 78% xmin progress: 79% xmin progress: 80% xmin progress: 81% xmin progress: 82% xmin progress: 84% xmin progress: 85% xmin progress: 86% xmin progress: 87% xmin progress: 89% xmin progress: 90% xmin progress: 91% xmin progress: 92% xmin progress: 93% xmin progress: 95% xmin progress: 96% xmin progress: 97% xmin progress: 98%

end{sphinxVerbatim}

xmin progress: 01% xmin progress: 02% xmin progress: 03% xmin progress: 04% xmin progress: 06% xmin progress: 07% xmin progress: 08% xmin progress: 09% xmin progress: 10% xmin progress: 12% xmin progress: 13% xmin progress: 14% xmin progress: 15% xmin progress: 17% xmin progress: 18% xmin progress: 19% xmin progress: 20% xmin progress: 21% xmin progress: 23% xmin progress: 24% xmin progress: 25% xmin progress: 26% xmin progress: 28% xmin progress: 29% xmin progress: 30% xmin progress: 31% xmin progress: 32% xmin progress: 34% xmin progress: 35% xmin progress: 36% xmin progress: 37% xmin progress: 39% xmin progress: 40% xmin progress: 41% xmin progress: 42% xmin progress: 43% xmin progress: 45% xmin progress: 46% xmin progress: 47% xmin progress: 48% xmin progress: 50% xmin progress: 51% xmin progress: 52% xmin progress: 53% xmin progress: 54% xmin progress: 56% xmin progress: 57% xmin progress: 58% xmin progress: 59% xmin progress: 60% xmin progress: 62% xmin progress: 63% xmin progress: 64% xmin progress: 65% xmin progress: 67% xmin progress: 68% xmin progress: 69% xmin progress: 70% xmin progress: 71% xmin progress: 73% xmin progress: 74% xmin progress: 75% xmin progress: 76% xmin progress: 78% xmin progress: 79% xmin progress: 80% xmin progress: 81% xmin progress: 82% xmin progress: 84% xmin progress: 85% xmin progress: 86% xmin progress: 87% xmin progress: 89% xmin progress: 90% xmin progress: 91% xmin progress: 92% xmin progress: 93% xmin progress: 95% xmin progress: 96% xmin progress: 97% xmin progress: 98%

The powerlaw coefficient can then be retrieved via:

[15]:
try:
    import powerlaw
    fit.alpha
except ImportError:
      print ("Module powerlaw could not be loaded")

If you further want to know how “good” it fits the power law distribution, you can use the the distribution_compare-function.

[16]:
try:
    import powerlaw
    fit.distribution_compare('power_law','exponential')
except ImportError:
      print ("Module powerlaw could not be loaded")

Community Detection

This section demonstrates the community detection capabilities of NetworKit. Community detection is concerned with identifying groups of nodes which are significantly more densely connected to eachother than to the rest of the network.

Code for community detection is contained in the community module. The module provides a top-level function to quickly perform community detection with a suitable algorithm and print some stats about the result.

[17]:
nk.community.detectCommunities(G)
Communities detected in 0.03935 [s]
solution properties:
-------------------  ------------
# communities         104
min community size      6
max community size    683
avg. community size   102.692
imbalance               6.63107
edge cut             1843
edge cut (portion)      0.0757937
modularity              0.883157
-------------------  ------------
[17]:
<networkit.structures.Partition at 0x7f7145a08f30>

The function prints some statistics and returns the partition object representing the communities in the network as an assignment of node to community label. Let’s capture this result of the last function call.

[18]:
communities = nk.community.detectCommunities(G)
Communities detected in 0.04167 [s]
solution properties:
-------------------  ------------
# communities          96
min community size      6
max community size    672
avg. community size   111.25
imbalance               6
edge cut             1866
edge cut (portion)      0.0767396
modularity              0.882791
-------------------  ------------

Modularity is the primary measure for the quality of a community detection solution. The value is in the range [-0.5,1] and usually depends both on the performance of the algorithm and the presence of distinctive community structures in the network.

[19]:
nk.community.Modularity().getQuality(communities, G)
[19]:
0.8827913951405469

The Partition Data Structure

The result of community detection is a partition of the node set into disjoint subsets. It is represented by the Partition data structure, which provides several methods for inspecting and manipulating a partition of a set of elements (which need not be the nodes of a graph).

[20]:
type(communities)
[20]:
networkit.structures.Partition
[21]:
print("{0} elements assigned to {1} subsets".format(communities.numberOfElements(),
                                                    communities.numberOfSubsets()))
10680 elements assigned to 96 subsets
[22]:
print("the biggest subset has size {0}".format(max(communities.subsetSizes())))
the biggest subset has size 672

The contents of a partition object can be written to file in a simple format, in which each line i contains the subset id of node i.

[23]:
nk.community.writeCommunities(communities, "output/communties.partition")
wrote communities to: output/communties.partition

Choice of Algorithm

The community detection function used a good default choice for an algorithm: PLM, our parallel implementation of the well-known Louvain method. It yields a high-quality solution at reasonably fast running times. Let us now apply a variation of this algorithm.

[24]:
nk.community.detectCommunities(G, algo=nk.community.PLM(G, True))
Communities detected in 0.05999 [s]
solution properties:
-------------------  ------------
# communities         104
min community size      6
max community size    685
avg. community size   102.692
imbalance               6.65049
edge cut             1825
edge cut (portion)      0.0750535
modularity              0.883899
-------------------  ------------
[24]:
<networkit.structures.Partition at 0x7f7142594f90>

We have switched on refinement, and we can see how modularity is slightly improved. For a small network like this, this takes only marginally longer.

Visualizing the Result

We can easily plot the distribution of community sizes as follows. While the distribution is skewed, it does not seem to fit a power-law, as shown by a log-log plot.

[25]:
sizes = communities.subsetSizes()
sizes.sort(reverse=True)
ax1 = plt.subplot(2,1,1)
ax1.set_ylabel("size")
ax1.plot(sizes)

ax2 = plt.subplot(2,1,2)
ax2.set_xscale("log")
ax2.set_yscale("log")
ax2.set_ylabel("size")
ax2.plot(sizes)
plt.show()
../_images/notebooks_User-Guide_62_0.png

Search and Shortest Paths

A simple breadth-first search from a starting node can be performed as follows:

[26]:
v = 0
bfs = nk.distance.BFS(G, v)
bfs.run()

bfsdist = bfs.getDistances()

The return value is a list of distances from v to other nodes - indexed by node id. For example, we can now calculate the mean distance from the starting node to all other nodes:

[27]:
sum(bfsdist) / len(bfsdist)
[27]:
11.339044943820225

Similarly, Dijkstra’s algorithm yields shortest path distances from a starting node to all other nodes in a weighted graph. Because PGPgiantcompo is an unweighted graph, the result is the same here:

[28]:
dijkstra = nk.distance.Dijkstra(G, v)
dijkstra.run()
spdist = dijkstra.getDistances()
sum(spdist) / len(spdist)
[28]:
11.339044943820225

Centrality

Centrality measures the relative importance of a node within a graph. Code for centrality analysis is grouped into the centrality module.

Betweenness Centrality

We implement Brandes’ algorithm for the exact calculation of betweenness centrality. While the algorithm is efficient, it still needs to calculate shortest paths between all pairs of nodes, so its scalability is limited. We demonstrate it here on the small Karate club graph.

[29]:
K = nk.readGraph("../input/karate.graph", nk.Format.METIS)
[30]:
bc = nk.centrality.Betweenness(K)
bc.run()
[30]:
<networkit.centrality.Betweenness at 0x7f7145a2a740>

We have now calculated centrality values for the given graph, and can retrieve them either as an ordered ranking of nodes or as a list of values indexed by node id.

[31]:
bc.ranking()[:10] # the 10 most central nodes
[31]:
[(0, 462.1428571428572),
 (33, 321.10317460317464),
 (32, 153.38095238095238),
 (2, 151.7015873015873),
 (31, 146.0190476190476),
 (8, 59.058730158730164),
 (1, 56.95714285714285),
 (13, 48.43174603174603),
 (19, 34.29365079365079),
 (5, 31.666666666666664)]

Approximation of Betweenness

Since exact calculation of betweenness scores is often out of reach, NetworKit provides an approximation algorithm based on path sampling. Here we estimate betweenness centrality in PGPgiantcompo, with a probabilistic guarantee that the error is no larger than an additive constant \(\epsilon\).

[32]:
abc = nk.centrality.ApproxBetweenness(G, epsilon=0.1)
abc.run()
[32]:
<networkit.centrality.ApproxBetweenness at 0x7f714253dd20>

The 10 most central nodes according to betweenness are then

[33]:
abc.ranking()[:10]
[33]:
[(1143, 0.14560770156438013),
 (7297, 0.09747292418772549),
 (6655, 0.08423586040914553),
 (6555, 0.08303249097472917),
 (6098, 0.06859205776173286),
 (6744, 0.06859205776173286),
 (6932, 0.06859205776173286),
 (3156, 0.06618531889290014),
 (2258, 0.046931407942238296),
 (5165, 0.046931407942238296)]

Eigenvector Centrality and PageRank

Eigenvector centrality and its variant PageRank assign relative importance to nodes according to their connections, incorporating the idea that edges to high-scoring nodes contribute more. PageRank is a version of eigenvector centrality which introduces a damping factor, modeling a random web surfer which at some point stops following links and jumps to a random page. In PageRank theory, centrality is understood as the probability of such a web surfer to arrive on a certain page. Our implementation of both measures is based on parallel power iteration, a relatively simple eigensolver.

[34]:
# Eigenvector centrality
ec = nk.centrality.EigenvectorCentrality(K)
ec.run()
ec.ranking()[:10] # the 10 most central nodes
[34]:
[(33, 0.37335860763538437),
 (0, 0.35548796275763045),
 (2, 0.31719212126079693),
 (32, 0.30864169996172613),
 (1, 0.26595844854862444),
 (8, 0.2274061452435449),
 (13, 0.22647475684342064),
 (3, 0.2111796960531623),
 (31, 0.1910365857249303),
 (30, 0.1747599501690216)]
[35]:
# PageRank
pr = nk.centrality.PageRank(K, 1e-6)
pr.run()
pr.ranking()[:10] # the 10 most central nodes
[35]:
[(33, 0.02941190490185556),
 (0, 0.029411888071820155),
 (32, 0.02941184486730034),
 (1, 0.02941180477938106),
 (2, 0.02941179873364914),
 (3, 0.029411771282676906),
 (31, 0.029411770725212477),
 (5, 0.029411768995095993),
 (6, 0.029411768995095993),
 (23, 0.029411763985014328)]

Core Decomposition

A \(k\)-core decomposition of a graph is performed by successicely peeling away nodes with degree less than \(k\). The remaining nodes form the \(k\)-core of the graph.

[36]:
K = nk.readGraph("../input/karate.graph", nk.Format.METIS)
coreDec = nk.centrality.CoreDecomposition(K)
coreDec.run()
[36]:
<networkit.centrality.CoreDecomposition at 0x7f714254e7d0>

Core decomposition assigns a core number to each node, being the maximum \(k\) for which a node is contained in the \(k\)-core. For this small graph, core numbers have the following range:

[37]:
set(coreDec.scores())
[37]:
{1.0, 2.0, 3.0, 4.0}
[38]:
from networkit import vizbridges

nk.vizbridges.widgetFromGraph(K, dimension = nk.vizbridges.Dimension.Two, nodeScores = coreDec.scores())
[38]:

Subgraph

NetworKit supports the creation of Subgraphs depending on an original graph and a set of nodes. This might be useful in case you want to analyze certain communities of a graph. Let’s say that community 2 of the above result is of further interest, so we want a new graph that consists of nodes and intra cluster edges of community 2.

[39]:
c2 = communities.getMembers(2)
g2 = nk.graphtools.subgraphFromNodes(G, c2, compact=True)
[40]:
communities.subsetSizeMap()[2]
[40]:
164
[41]:
g2.numberOfNodes()
[41]:
164

As we can see, the number of nodes in our subgraph matches the number of nodes of community 2. The subgraph can be used like any other graph object, e.g. further community analysis:

[42]:
communities2 = nk.community.detectCommunities(g2)
Communities detected in 0.00074 [s]
solution properties:
-------------------  ---------
# communities        10
min community size    5
max community size   34
avg. community size  16.4
imbalance             2
edge cut             30
edge cut (portion)    0.114504
modularity            0.749374
-------------------  ---------
[43]:
nk.vizbridges.widgetFromGraph(g2, dimension = nk.vizbridges.Dimension.Two, nodePartition=communities2)
[43]:

NetworkX Compatibility

NetworkX is a popular Python package for network analysis. To let both packages complement each other, and to enable the adaptation of existing NetworkX-based code, we support the conversion of the respective graph data structures.

[44]:
import networkx as nx
nxG = nk.nxadapter.nk2nx(G) # convert from NetworKit.Graph to networkx.Graph
print(nx.degree_assortativity_coefficient(nxG))
0.23821137170816217

Generating Graphs

An important subfield of network science is the design and analysis of generative models. A variety of generative models have been proposed with the aim of reproducing one or several of the properties we find in real-world complex networks. NetworKit includes generator algorithms for several of them.

The Erdös-Renyi model is the most basic random graph model, in which each edge exists with the same uniform probability. NetworKit provides an efficient generator:

[45]:
ERD = nk.generators.ErdosRenyiGenerator(200, 0.2).generate()
print(ERD.numberOfNodes(), ERD.numberOfEdges())
200 3908

Transitivity / Clustering Coefficients

In the most general sense, transitivity measures quantify how likely it is that the relations out of which the network is built are transitive. The clustering coefficient is the most prominent of such measures. We need to distinguish between global and local clustering coefficient: The global clustering coefficient for a network gives the fraction of closed triads. The local clustering coefficient focuses on a single node and counts how many of the possible edges between neighbors of the node exist. The average of this value over all nodes is a good indicator for the degreee of transitivity and the presence of community structures in a network, and this is what the following function returns:

[46]:
nk.globals.clustering(G)
[46]:
0.4363762703031356

A simple way to generate a random graph with community structure is to use the ClusteredRandomGraphGenerator. It uses a simple variant of the Erdös-Renyi model: The node set is partitioned into a given number of subsets. Nodes within the same subset have a higher edge probability.

[47]:
CRG = nk.generators.ClusteredRandomGraphGenerator(200, 4, 0.2, 0.002).generate()
nk.community.detectCommunities(CRG)
Communities detected in 0.00151 [s]
solution properties:
-------------------  ----------
# communities         4
min community size   41
max community size   60
avg. community size  50
imbalance             1.2
edge cut             22
edge cut (portion)    0.0214216
modularity            0.698991
-------------------  ----------
[47]:
<networkit.structures.Partition at 0x7f714271ddd0>

The Chung-Lu model (also called configuration model) generates a random graph which corresponds to a given degree sequence, i.e. has the same expected degree sequence. It can therefore be used to replicate some of the properties of a given real networks, while others are not retained, such as high clustering and the specific community structure.

[48]:
degreeSequence = [CRG.degree(v) for v in CRG.iterNodes()]
clgen = nk.generators.ChungLuGenerator(degreeSequence)
CLG = clgen.generate()
nk.community.detectCommunities(CLG)
Communities detected in 0.00232 [s]
solution properties:
-------------------  ----------
# communities          8
min community size    16
max community size    34
avg. community size   25
imbalance              1.36
edge cut             632
edge cut (portion)     0.607109
modularity             0.262138
-------------------  ----------
[48]:
<networkit.structures.Partition at 0x7f714271fed0>

Settings

In this section we discuss global settings.

Logging

When using NetworKit from the command line, the verbosity of console output can be controlled via several loglevels, from least to most verbose: FATAL, ERROR, WARN, INFO, DEBUG and TRACE. (Currently, logging is only available on the console and not visible in the IPython Notebook).

[49]:
nk.getLogLevel() # the default loglevel
[49]:
'ERROR'
[50]:
nk.setLogLevel("TRACE") # set to most verbose mode
nk.setLogLevel("ERROR") # set back to default

Please note, that the default build setting is optimized (--optimize=Opt) and thus, every LOG statement below INFO is removed. If you need DEBUG and TRACE statements, please build the extension module by appending --optimize=Dbg when calling the setup script.

Parallelism

The degree of parallelism can be controlled and monitored in the following way:

[51]:
nk.setNumberOfThreads(4) # set the maximum number of available threads
[52]:
nk.getMaxNumberOfThreads() # see maximum number of available threads
[52]:
4
[53]:
nk.getCurrentNumberOfThreads() # the number of threads currently executing
[53]:
1

Profiling

The profiling module allows to get an overall picture of a network with a single line of code. Detailed statistics of the main properties of the network are shown both graphically and numerically.

[54]:
import warnings
warnings.filterwarnings('ignore')
nk.profiling.Profile.create(G).show()

Network Structural Profile


Navigating the profile
  • the profile includes the following sections by default:
    • global properties
    • overview of node centrality and partition distributions
    • detail views of node centrality distributions
    • node centrality correlations
    • detail views of partitions
  • click [+] for descriptions of measures
  • click on distribution thumbnail for detail view
  • hover over variable name of statistical figures for explanation
  • click on distribution plot for larger view

10680
24316
0.000426403
False
False
0
(23, 24)
N/A
1
Node centrality index which ranks nodes by their degree.
DegreeCentrality
Properties
$ n = $ 10680
$ f_{BC} = $ 1.00009
$ r = $ 0.239188
$ C = $ 0.0187764
Location
$ x_{(1)} = $ 1
$ x_{(n)} = $ 205
$ \widetilde{x}_{O_{min}} = $ 1
$ \widetilde{x}_{N_{min}} = $ 1
$ \widetilde{x}_{0.25} = $ 1
$ \widetilde{x}_{Med} = $ 2
$ \widetilde{x}_{0.75} = $ 4
$ \widetilde{x}_{N_{max}} = $ 8
$ \widetilde{x}_{O_{max}} = $ 13
$ \overline{x} = $ 4.55356
$ \overline{x}_{IQ} = $ 2.15281
$ \overline{x}_{R} = $ 103
$ \overline{x}_{H^{-1}} = $ 1.75165
$ \overline{x}_{H^{2}} = $ 9.27234
$ \overline{x}_{H^{3}} = $ 16.4642
Dispersion
$ s^{2}_{x} = $ 65.2474
$ s_{x} = $ 8.07759
$ v_{x} = $ 1.77391
$ R = $ 204
$ R_{IQ} = $ 3
Shape
$ S_{YP} = $ 0.948386
$ S_{M} = $ 6.59782
$ \gamma = $ 81.4638
Rank
$ s^{2}_{rk_{x}} = $ 8.83549e+06
$ s_{rk_{x}} = $ 2972.46
$ v_{rk_{x}} = $ 0.556588
Binning
$ k_{PDF} = $ 20
$ k_{CDF} = $ 83
$ x_{D_{PDF}} = $ 6.1
k-cores result from successively peeling away nodes of degree k. It also categorizes nodes according to the highest-order core in which they are contained, assigning a core number to each node.
CoreDecomposition
Properties
$ n = $ 10680
$ f_{BC} = $ 1.00009
$ r = $ 0.813431
$ C = $ 0.00263954
Location
$ x_{(1)} = $ 1
$ x_{(n)} = $ 31
$ \widetilde{x}_{O_{min}} = $ 1
$ \widetilde{x}_{N_{min}} = $ 1
$ \widetilde{x}_{0.25} = $ 1
$ \widetilde{x}_{Med} = $ 2
$ \widetilde{x}_{0.75} = $ 3
$ \widetilde{x}_{N_{max}} = $ 6
$ \widetilde{x}_{O_{max}} = $ 9
$ \overline{x} = $ 2.81976
$ \overline{x}_{IQ} = $ 1.57772
$ \overline{x}_{R} = $ 16
$ \overline{x}_{H^{-1}} = $ 1.50009
$ \overline{x}_{H^{2}} = $ 4.83988
$ \overline{x}_{H^{3}} = $ 7.49951
Dispersion
$ s^{2}_{x} = $ 15.4749
$ s_{x} = $ 3.93381
$ v_{x} = $ 1.39509
$ R = $ 30
$ R_{IQ} = $ 2
Shape
$ S_{YP} = $ 0.625162
$ S_{M} = $ 4.4103
$ \gamma = $ 23.5992
Rank
$ s^{2}_{rk_{x}} = $ 8.2564e+06
$ s_{rk_{x}} = $ 2873.4
$ v_{rk_{x}} = $ 0.538039
Binning
$ k_{PDF} = $ 20
$ k_{CDF} = $ 26
$ x_{D_{PDF}} = $ 1.75
The local clustering coefficient of a node is the fraction of edges that exist between neighbors of that node (or, equivalently, closed triangles centered on that node)
LocalClusteringCoefficient
Properties
$ n = $ 10680
$ f_{BC} = $ 1.00009
$ r = $ 0.497186
$ C = $ 1
Location
$ x_{(1)} = $ 0
$ x_{(n)} = $ 1
$ \widetilde{x}_{O_{min}} = $ 0
$ \widetilde{x}_{N_{min}} = $ 0
$ \widetilde{x}_{0.25} = $ 0
$ \widetilde{x}_{Med} = $ 0
$ \widetilde{x}_{0.75} = $ 0.5
$ \widetilde{x}_{N_{max}} = $ 1
$ \widetilde{x}_{O_{max}} = $ 1
$ \overline{x} = $ 0.265945
$ \overline{x}_{IQ} = $ 0.0996067
$ \overline{x}_{R} = $ 0.5
$ \overline{x}_{H^{-1}} = $ nan
$ \overline{x}_{H^{2}} = $ 0.458205
$ \overline{x}_{H^{3}} = $ 0.569829
Dispersion
$ s^{2}_{x} = $ 0.139238
$ s_{x} = $ 0.373146
$ v_{x} = $ 1.4031
$ R = $ 1
$ R_{IQ} = $ 0.5
Shape
$ S_{YP} = $ 2.13813
$ S_{M} = $ 1.06123
$ \gamma = $ -0.487303
Rank
$ s^{2}_{rk_{x}} = $ 7.8358e+06
$ s_{rk_{x}} = $ 2799.25
$ v_{rk_{x}} = $ 0.524155
Binning
$ k_{PDF} = $ 20
$ k_{CDF} = $ 246
$ x_{D_{PDF}} = $ 0.025
PageRank assigns relative importance to nodes according to their connections, incorporating the idea that edges to high-scoring nodes contribute more.
PageRank
Properties
$ n = $ 10680
$ f_{BC} = $ 1.00009
$ r = $ -0.0159402
$ C = $ 1
Location
$ x_{(1)} = $ 1.883e-05
$ x_{(n)} = $ 0.00344352
$ \widetilde{x}_{O_{min}} = $ 1.883e-05
$ \widetilde{x}_{N_{min}} = $ 1.883e-05
$ \widetilde{x}_{0.25} = $ 4.23401e-05
$ \widetilde{x}_{Med} = $ 6.3501e-05
$ \widetilde{x}_{0.75} = $ 0.00010991
$ \widetilde{x}_{N_{max}} = $ 0.0002111
$ \widetilde{x}_{O_{max}} = $ 0.000311606
$ \overline{x} = $ 9.3633e-05
$ \overline{x}_{IQ} = $ 6.7286e-05
$ \overline{x}_{R} = $ 0.00173118
$ \overline{x}_{H^{-1}} = $ 5.81456e-05
$ \overline{x}_{H^{2}} = $ 0.000142813
$ \overline{x}_{H^{3}} = $ 0.000254285
Dispersion
$ s^{2}_{x} = $ 1.16294e-08
$ s_{x} = $ 0.00010784
$ v_{x} = $ 1.15173
$ R = $ 0.00342469
$ R_{IQ} = $ 6.75699e-05
Shape
$ S_{YP} = $ 0.838245
$ S_{M} = $ 9.85161
$ \gamma = $ 202.172
Rank
$ s^{2}_{rk_{x}} = $ 9.50609e+06
$ s_{rk_{x}} = $ 3083.19
$ v_{rk_{x}} = $ 0.577323
Binning
$ k_{PDF} = $ 20
$ k_{CDF} = $ 81
$ x_{D_{PDF}} = $ 0.000104447
KatzCentrality
Properties
$ n = $ 10680
$ f_{BC} = $ 1.00009
$ r = $ 0.299514
$ C = $ nan
Location
$ x_{(1)} = $ 0.00746174
$ x_{(n)} = $ 0.0990854
$ \widetilde{x}_{O_{min}} = $ 0.00746174
$ \widetilde{x}_{N_{min}} = $ 0.00746174
$ \widetilde{x}_{0.25} = $ 0.00747556
$ \widetilde{x}_{Med} = $ 0.00786213
$ \widetilde{x}_{0.75} = $ 0.00877612
$ \widetilde{x}_{N_{max}} = $ 0.0107234
$ \widetilde{x}_{O_{max}} = $ 0.0126709
$ \overline{x} = $ 0.00898401
$ \overline{x}_{IQ} = $ 0.00794757
$ \overline{x}_{R} = $ 0.0532736
$ \overline{x}_{H^{-1}} = $ 0.00842803
$ \overline{x}_{H^{2}} = $ 0.00967641
$ \overline{x}_{H^{3}} = $ 0.0111483
Dispersion
$ s^{2}_{x} = $ 1.29218e-05
$ s_{x} = $ 0.00359468
$ v_{x} = $ 0.40012
$ R = $ 0.0916237
$ R_{IQ} = $ 0.00130056
Shape
$ S_{YP} = $ 0.936282
$ S_{M} = $ 6.72171
$ \gamma = $ 81.8143
Rank
$ s^{2}_{rk_{x}} = $ 9.50609e+06
$ s_{rk_{x}} = $ 3083.19
$ v_{rk_{x}} = $ 0.577323
Binning
$ k_{PDF} = $ 20
$ k_{CDF} = $ 98
$ x_{D_{PDF}} = $ 0.00975234
Betweenness centrality is the fraction of shortest paths between any pair of nodes that passes through a node.
EstimateBetweenness
Properties
$ n = $ 10680
$ f_{BC} = $ 1.00009
$ r = $ 0.049914
$ C = $ nan
Location
$ x_{(1)} = $ 0
$ x_{(n)} = $ 0.127821
$ \widetilde{x}_{O_{min}} = $ 0
$ \widetilde{x}_{N_{min}} = $ 0
$ \widetilde{x}_{0.25} = $ 0
$ \widetilde{x}_{Med} = $ 0
$ \widetilde{x}_{0.75} = $ 0.000237096
$ \widetilde{x}_{N_{max}} = $ 0.000592196
$ \widetilde{x}_{O_{max}} = $ 0.000947117
$ \overline{x} = $ 0.000676837
$ \overline{x}_{IQ} = $ 4.16767e-05
$ \overline{x}_{R} = $ 0.0639104
$ \overline{x}_{H^{-1}} = $ nan
$ \overline{x}_{H^{2}} = $ 0.00434282
$ \overline{x}_{H^{3}} = $ 0.010911
Dispersion
$ s^{2}_{x} = $ 1.84037e-05
$ s_{x} = $ 0.00428995
$ v_{x} = $ 6.33824
$ R = $ 0.127821
$ R_{IQ} = $ 0.000237096
Shape
$ S_{YP} = $ 0.473318
$ S_{M} = $ 15.9753
$ \gamma = $ 322.52
Rank
$ s^{2}_{rk_{x}} = $ 7.66156e+06
$ s_{rk_{x}} = $ 2767.95
$ v_{rk_{x}} = $ 0.518295
Binning
$ k_{PDF} = $ 20
$ k_{CDF} = $ 92
$ x_{D_{PDF}} = $ 0.00319552
Degree

k-Core Decomposition

Local Clustering Coefficient

PageRank

Katz Centrality

Betweenness

Community detection is the task of identifying groups of nodes in the network which are significantly more densely connected among each other than to the rest of nodes.
PLM
Properties
$ n = $ 95
$ f_{BC} = $ 1.01064
$ r = $ nan
$ C = $ nan
Location
$ x_{(1)} = $ 6
$ x_{(n)} = $ 672
$ \widetilde{x}_{O_{min}} = $ 6
$ \widetilde{x}_{N_{min}} = $ 6
$ \widetilde{x}_{0.25} = $ 16
$ \widetilde{x}_{Med} = $ 43
$ \widetilde{x}_{0.75} = $ 190
$ \widetilde{x}_{N_{max}} = $ 429
$ \widetilde{x}_{O_{max}} = $ 672
$ \overline{x} = $ 112.421
$ \overline{x}_{IQ} = $ 62.4082
$ \overline{x}_{R} = $ 339
$ \overline{x}_{H^{-1}} = $ 24.9003
$ \overline{x}_{H^{2}} = $ 179.827
$ \overline{x}_{H^{3}} = $ 235.958
Dispersion
$ s^{2}_{x} = $ 19908.7
$ s_{x} = $ 141.098
$ v_{x} = $ 1.25509
$ R = $ 666
$ R_{IQ} = $ 174
Shape
$ S_{YP} = $ 1.47602
$ S_{M} = $ 1.80581
$ \gamma = $ 3.22916
Rank
$ s^{2}_{rk_{x}} = $ 759.729
$ s_{rk_{x}} = $ 27.5632
$ v_{rk_{x}} = $ 0.574233
Binning
$ k_{PDF} = $ 9
$ k_{CDF} = $ 55
$ x_{D_{PDF}} = $ 43
All nodes in a connected component are reachable from each other.
ConnectedComponents
Properties
$ n = $ 1
$ f_{BC} = $ nan
$ r = $ nan
$ C = $ nan
Location
$ x_{(1)} = $ 10680
$ x_{(n)} = $ 10680
$ \widetilde{x}_{O_{min}} = $ 10680
$ \widetilde{x}_{N_{min}} = $ 10680
$ \widetilde{x}_{0.25} = $ 10680
$ \widetilde{x}_{Med} = $ 10680
$ \widetilde{x}_{0.75} = $ 10680
$ \widetilde{x}_{N_{max}} = $ 10680
$ \widetilde{x}_{O_{max}} = $ 10680
$ \overline{x} = $ 10680
$ \overline{x}_{IQ} = $ 10680
$ \overline{x}_{R} = $ 10680
$ \overline{x}_{H^{-1}} = $ 10680
$ \overline{x}_{H^{2}} = $ 10680
$ \overline{x}_{H^{3}} = $ 10680
Dispersion
$ s^{2}_{x} = $ nan
$ s_{x} = $ nan
$ v_{x} = $ nan
$ R = $ 0
$ R_{IQ} = $ 0
Shape
$ S_{YP} = $ nan
$ S_{M} = $ nan
$ \gamma = $ nan
Rank
$ s^{2}_{rk_{x}} = $ nan
$ s_{rk_{x}} = $ nan
$ v_{rk_{x}} = $ nan
Binning
$ k_{PDF} = $ 1
$ k_{CDF} = $ 1
$ x_{D_{PDF}} = $ 10680
k-cores result from successively peeling away nodes of degree k. It also categorizes nodes according to the highest-order core in which they are contained, assigning a core number to each node.
CoreDecomposition
Properties
$ n = $ 26
$ f_{BC} = $ 1.04
$ r = $ nan
$ C = $ nan
Location
$ x_{(1)} = $ 1
$ x_{(n)} = $ 5246
$ \widetilde{x}_{O_{min}} = $ 1
$ \widetilde{x}_{N_{min}} = $ 1
$ \widetilde{x}_{0.25} = $ 19
$ \widetilde{x}_{Med} = $ 46
$ \widetilde{x}_{0.75} = $ 148
$ \widetilde{x}_{N_{max}} = $ 236
$ \widetilde{x}_{O_{max}} = $ 463
$ \overline{x} = $ 410.769
$ \overline{x}_{IQ} = $ 63.7857
$ \overline{x}_{R} = $ 2623.5
$ \overline{x}_{H^{-1}} = $ 6.70519
$ \overline{x}_{H^{2}} = $ 1160.2
$ \overline{x}_{H^{3}} = $ 1833.07
Dispersion
$ s^{2}_{x} = $ 1.22442e+06
$ s_{x} = $ 1106.54
$ v_{x} = $ 2.69381
$ R = $ 5245
$ R_{IQ} = $ 129
Shape
$ S_{YP} = $ 0.98895
$ S_{M} = $ 3.42416
$ \gamma = $ 11.4736
Rank
$ s^{2}_{rk_{x}} = $ 58.44
$ s_{rk_{x}} = $ 7.64461
$ v_{rk_{x}} = $ 0.566267
Binning
$ k_{PDF} = $ 5
$ k_{CDF} = $ 13
$ x_{D_{PDF}} = $ 525.5

Support

NetworKit is an open-source project that improves with suggestions and contributions from its users. The mailing list is the place for general discussion and questions.