NetworKit Randomization Tutorial

The randomization module implements algorithms to perturb existing graphs. This is commonly used to obtain a null-model for hypothesis testing in network analysis (see below for an example). Intuitively one tests whether an observation in original also appears in similar graphs. By doing so one can quantify the statistical significance of the observation. The ensemble of similar networks is commonly chosen as the set of (simple) graphs that have the same degree sequence.

Edge Switching is a commonly used Markov Chain approach for this purpose. In every step it selects two edges uniformly at random and exchanges one endpoint of each. While no practical rigorous bounds on the mixing time are known, in literature typically 10 to 100 times the number of edges is used.

The Curveball algorithms creates a random sample with the same degree sequence, by repeatedly selecting two random nodes and randomly exchanging their neighbourhoods (this is referred to as a trade). This corresponds to a random walk on the space of all graphs in the ensemble. The more trades are carried out, the less correlated input and output are. While there a no practical lower bounds, one typically choose 10 to 100 times the number n of nodes.

GlobalCurveball is a related algorithm that is typically faster than Curveball. Also it’s implementation is more versatile (e.g., it support directed and undirected graphs). In each step it carries out n/2 trades involving all nodes. A typical choice of the number of GlobalTrades is 10 to 100. We recommend the usage of GlobalCurveball over Curveball for performance reasons.

[1]:
import networkit as nk

Global Curveball

The Global Curveball class is an implementation of EM-GCB proposed in Parallel and I/O-efficient Randomisation of Massive Networks using Global Curveball Trades", Carstens et al., ESA 2018. The algorithm perturbs an input graph, by iteratively randomizing the neighbourhoods of node pairs. For a large number of global trades this process is shown to produce a uniform sample from the set of all graphs with the same degree sequence as the input graph.

The GlobalCurveball(G, number_of_global_rounds=20, allowSelfLoops=False, degreePreservingShufflePreprocessing=True) constructor expecs a graph, followed by the parameter number_of_global_rounds which dictactes the number of global rounds to carry out. To accelerate the perturbation process we recommend to set degreePreservingShufflePreprocessing to true. This jump-starts the randomization process and yields faster convergence of the algorithm (see “Smaller Universes for Uniform Sampling of 0,1-matrices with fixed row and column sums” * by Annabell Berger, Corrie Jacobien Carstens [https://arxiv.org/abs/1803.02624]). For directed simple graphs the algorithm will not yield a uniform sample unless the preprocessing is enabled.

[2]:
# Read graph
G = nk.graphio.readGraph("../input/karate.graph", nk.Format.METIS)
print(G.numberOfNodes(), G.numberOfEdges())
34 78
[3]:
# Initialize algorithm
globalCurve = nk.randomization.GlobalCurveball(G)
[4]:
# Run algorithm
globalCurve.run()
[4]:
<networkit.randomization.GlobalCurveball at 0x7f78fccaacf0>
[5]:
# Get randomized graph
randomGlobalG = globalCurve.getGraph()
[6]:
# Verify
print(randomGlobalG.numberOfNodes(), randomGlobalG.numberOfEdges())
assert(randomGlobalG.numberOfNodes() == G.numberOfNodes())
for u in range (randomGlobalG.upperNodeIdBound()):
    assert(randomGlobalG.degree(u) == G.degree(u))
34 78

Application example

Our hypothesis is that Hyperbolic Graphs (RHGs) have a high clustering. To test the hypothesis we first obtain an RHG and compute its local clustering coefficient. Then, we compute five random graphs with the same degree sequence and observe that their mean LCC score is much lower. This gives empirical support towards our hypothesis.

[7]:
def llcScore(G):
    """Compute average local clustering coefficient"""
    return sum(nk.centrality.LocalClusteringCoefficient(G).run().scores()) / G.numberOfNodes()

# Hyperbolic graphs are known to have an above average clustering
G = nk.generators.HyperbolicGenerator(1000, 10).generate()
print("Avg. clustering of input:         %.3f" % llcScore(G))

for i in range(5):
    # Take 5 random graphs with the same degree sequence
    sampledGraph = nk.randomization.GlobalCurveball(G).run().getGraph()
    print("Avg. clustering of random sample: %.3f" % llcScore(sampledGraph))
Avg. clustering of input:         0.790
Avg. clustering of random sample: 0.097
Avg. clustering of random sample: 0.100
Avg. clustering of random sample: 0.098
Avg. clustering of random sample: 0.101
Avg. clustering of random sample: 0.095

Curveball

The Curveball class in an implementation of IM-CB proposed in Parallel and I/O-efficient Randomisation of Massive Networks using Global Curveball Trades", Carstens et al., ESA 2018. The algorithm perturbs an undirected and unweighted input graph, by iteratively randomizing the neighbourhoods of node pairs.

The Curveball(G) constructor expects an undirected, unweighted graph. This class does not support the run() method; it instead provides the run(trades) method. The parameter trades is a vector of pairs of nodes. The run method can be called multiple times.

Following is an example:

[8]:
# Read graph
G = nk.graphio.readGraph("../input/celegans_metabolic.thrill", nk.Format.ThrillBinary)
print(G.numberOfNodes(), G.numberOfEdges())
453 2025
[9]:
# Initialize algorithm
curve = nk.randomization.Curveball(G)

The CurveballUniformTradeGenerator(num_trades, num_nodes) class can be used to generate trades. num_trades is the number of trades to generate while num_nodes is the number of node indices to draw from. It generates a trade sequence consisting of num_trades many single trades. Each trade contains two different node indices drawn uniformly at random from the interval [0, num_nodes).

[10]:
# Generate trade sequence
utg = nk.randomization.CurveballUniformTradeGenerator(G.numberOfNodes(), G.numberOfNodes())
trades = utg.generate()
[11]:
# Run algorithm
curve.run(trades)
[11]:
<networkit.randomization.Curveball at 0x7f78c72d6070>
[12]:
# Get randomized graph
randomCurveG = curve.getGraph()
[13]:
# Verify
print(randomCurveG.numberOfNodes(), randomCurveG.numberOfEdges())
assert(randomCurveG.numberOfNodes() == G.numberOfNodes())
for u in range (randomCurveG.upperNodeIdBound()):
    assert(randomCurveG.degree(u) == G.degree(u))
453 2025

DegreePreservingShuffle

DegreePreservingShuffle is available as a standalone module. It will relabel nodes while keeping their degrees but wont change the topology of the graph. The constructor DegreePreservingShuffle(G) expects a directed or undirected graph.

[14]:
# Generate graph
G = nk.generators.ErdosRenyiGenerator(200, 0.2).generate()
[15]:
# Initialze algorithm
dps = nk.randomization.DegreePreservingShuffle(G)
[16]:
# Run algorithm
dps.run()
[16]:
<networkit.randomization.DegreePreservingShuffle at 0x7f78c71a21b0>
[17]:
randomG = dps.getGraph()
[18]:
# Verify
for u in range(G.upperNodeIdBound()):
    assert(G.degree(u) == randomG.degree(u))

Edge Switching Markov Chain

Edge Switching Markov Chain [“The markov chain simulation method for generating connected power law random graphs”, Mihail and Zegura] perturbs simple directed or undirected graphs while preserving the node degrees. In each step, two edges are selected uniformly at random, and two endpoints exchanged. Swaps that introduce multi-edges or self-loops are rejected without replacement – this is necessary to allow uniform sampling [see “Switching edges to randomize networks: what goes wrong and how to fix it”, Carstens and Horadam].

In general, simple edge switching does not yield a uniform distribution on simple directed graphs because the orientation of a directed triangles cannot be changed. Using DegreePreservingShuffle as a preprocessing step overcomes this limitation. For convenience this is done by default for any graph to heuristically accelerate mixing.

[19]:
# Generate graph
G = nk.generators.ErdosRenyiGenerator(200, 0.2).generate()
[20]:
# Initialze algorithm
es = nk.randomization.EdgeSwitching(G)
[21]:
# Run algorithm
es.run()
[22]:
randomG = es.getGraph()
[23]:
# Verify
for u in range(G.upperNodeIdBound()):
    assert(G.degree(u) == randomG.degree(u))

Inplace Edge Switching

We offer an in-place variant of nk.randomization.EdgeSwitching which is more efficient if the original graph is not needed anymore. The wrapper also takes ownership of the graph, so it’s safe to let the original graph name go out of scope.

In the following we pick up the previous Local Clustering Coefficient example. This time, however, we are not interested in the average value, but rather in the progress the made as we carry out an increasing number of Markov Chain steps.

[24]:
G = nk.generators.HyperbolicGenerator(1000, 10).generate()
alg = nk.randomization.EdgeSwitchingInPlace(G, 0.1)

for i in range(10):
    if i > 0:
        # do a few switches
        alg.run() # this will update G directly

    score =  sum(nk.centrality.LocalClusteringCoefficient(G).run().scores()) / G.numberOfNodes()
    print("After % 5d switches the llc score is %.3f" % (500 * i, score))

After     0 switches the llc score is 0.779
After   500 switches the llc score is 0.447
After  1000 switches the llc score is 0.264
After  1500 switches the llc score is 0.162
After  2000 switches the llc score is 0.096
After  2500 switches the llc score is 0.067
After  3000 switches the llc score is 0.053
After  3500 switches the llc score is 0.045
After  4000 switches the llc score is 0.039
After  4500 switches the llc score is 0.035
[ ]: