The randomization module implements algorithms to perturb existing graphs. This is commonly used to obtain a null-model for hypothesis testing in network analysis (see below for an example). Intuitively one tests whether an observation in original also appears in similar graphs. By doing so one can quantify the statistical significance of the observation. The ensemble of similar networks is commonly chosen as the set of (simple) graphs that have the same degree sequence.
Edge Switching is a commonly used Markov Chain approach for this purpose. In every step it selects two edges uniformly at random and exchanges one endpoint of each. While no practical rigorous bounds on the mixing time are known, in literature typically 10 to 100 times the number of edges is used.
The Curveball algorithms creates a random sample with the same degree sequence, by repeatedly selecting two random nodes and randomly exchanging their neighbourhoods (this is referred to as a trade). This corresponds to a random walk on the space of all graphs in the ensemble. The more trades are carried out, the less correlated input and output are. While there a no practical lower bounds, one typically choose 10 to 100 times the number n of nodes.
GlobalCurveball is a related algorithm that is typically faster than Curveball. Also it’s implementation is more versatile (e.g., it support directed and undirected graphs). In each step it carries out n/2 trades involving all nodes. A typical choice of the number of GlobalTrades is 10 to 100. We recommend the usage of GlobalCurveball over Curveball for performance reasons.
[1]:
import networkit as nk
The Global Curveball class is an implementation of EM-GCB proposed in Parallel and I/O-efficient Randomisation of Massive Networks using Global Curveball Trades", Carstens et al., ESA 2018
. The algorithm perturbs an input graph, by iteratively randomizing the neighbourhoods of node pairs. For a large number of global trades this process is shown to produce a uniform sample from the set of all graphs with the same degree sequence as the input graph.
The GlobalCurveball(G, number_of_global_rounds=20, allowSelfLoops=False, degreePreservingShufflePreprocessing=True)
constructor expecs a graph, followed by the parameter number_of_global_rounds
which dictactes the number of global rounds to carry out. To accelerate the perturbation process we recommend to set degreePreservingShufflePreprocessing to true. This jump-starts the randomization process and yields faster convergence of the algorithm (see “Smaller Universes for Uniform Sampling
of 0,1-matrices with fixed row and column sums” * by Annabell Berger, Corrie Jacobien Carstens [https://arxiv.org/abs/1803.02624]). For directed simple graphs the algorithm will not yield a uniform sample unless the preprocessing is enabled.
[2]:
# Read graph
G = nk.graphio.readGraph("../input/karate.graph", nk.Format.METIS)
print(G.numberOfNodes(), G.numberOfEdges())
34 78
[3]:
# Initialize algorithm
globalCurve = nk.randomization.GlobalCurveball(G)
[4]:
# Run algorithm
globalCurve.run()
[4]:
<networkit.randomization.GlobalCurveball at 0x7f72146a74d0>
[5]:
# Get randomized graph
randomGlobalG = globalCurve.getGraph()
[6]:
# Verify
print(randomGlobalG.numberOfNodes(), randomGlobalG.numberOfEdges())
assert(randomGlobalG.numberOfNodes() == G.numberOfNodes())
for u in range (randomGlobalG.upperNodeIdBound()):
assert(randomGlobalG.degree(u) == G.degree(u))
34 78
Our hypothesis is that Hyperbolic Graphs (RHGs) have a high clustering. To test the hypothesis we first obtain an RHG and compute its local clustering coefficient. Then, we compute five random graphs with the same degree sequence and observe that their mean LCC score is much lower. This gives empirical support towards our hypothesis.
[7]:
def llcScore(G):
"""Compute average local clustering coefficient"""
return sum(nk.centrality.LocalClusteringCoefficient(G).run().scores()) / G.numberOfNodes()
# Hyperbolic graphs are known to have an above average clustering
G = nk.generators.HyperbolicGenerator(1000, 10).generate()
print("Avg. clustering of input: %.3f" % llcScore(G))
for i in range(5):
# Take 5 random graphs with the same degree sequence
sampledGraph = nk.randomization.GlobalCurveball(G).run().getGraph()
print("Avg. clustering of random sample: %.3f" % llcScore(sampledGraph))
Avg. clustering of input: 0.774
Avg. clustering of random sample: 0.065
Avg. clustering of random sample: 0.065
Avg. clustering of random sample: 0.064
Avg. clustering of random sample: 0.061
Avg. clustering of random sample: 0.063
The Curveball class in an implementation of IM-CB proposed in Parallel and I/O-efficient Randomisation of Massive Networks using Global Curveball Trades", Carstens et al., ESA 2018
. The algorithm perturbs an undirected and unweighted input graph, by iteratively randomizing the neighbourhoods of node pairs.
The Curveball(G)
constructor expects an undirected, unweighted graph. This class does not support the run()
method; it instead provides the run(trades)
method. The parameter trades
is a vector of pairs of nodes. The run
method can be called multiple times.
Following is an example:
[8]:
# Read graph
G = nk.graphio.readGraph("../input/celegans_metabolic.thrill", nk.Format.ThrillBinary)
print(G.numberOfNodes(), G.numberOfEdges())
453 2025
[9]:
# Initialize algorithm
curve = nk.randomization.Curveball(G)
The CurveballUniformTradeGenerator(num_trades, num_nodes)
class can be used to generate trades. num_trades
is the number of trades to generate while num_nodes
is the number of node indices to draw from. It generates a trade sequence consisting of num_trades
many single trades. Each trade contains two different node indices drawn uniformly at random from the interval [0, num_nodes).
[10]:
# Generate trade sequence
utg = nk.randomization.CurveballUniformTradeGenerator(G.numberOfNodes(), G.numberOfNodes())
trades = utg.generate()
[11]:
# Run algorithm
curve.run(trades)
[11]:
<networkit.randomization.Curveball at 0x7f71d720da30>
[12]:
# Get randomized graph
randomCurveG = curve.getGraph()
[13]:
# Verify
print(randomCurveG.numberOfNodes(), randomCurveG.numberOfEdges())
assert(randomCurveG.numberOfNodes() == G.numberOfNodes())
for u in range (randomCurveG.upperNodeIdBound()):
assert(randomCurveG.degree(u) == G.degree(u))
453 2025
DegreePreservingShuffle is available as a standalone module. It will relabel nodes while keeping their degrees but wont change the topology of the graph. The constructor DegreePreservingShuffle(G)
expects a directed or undirected graph.
[14]:
# Generate graph
G = nk.generators.ErdosRenyiGenerator(200, 0.2).generate()
[15]:
# Initialze algorithm
dps = nk.randomization.DegreePreservingShuffle(G)
[16]:
# Run algorithm
dps.run()
[16]:
<networkit.randomization.DegreePreservingShuffle at 0x7f71d7130d70>
[17]:
randomG = dps.getGraph()
[18]:
# Verify
for u in range(G.upperNodeIdBound()):
assert(G.degree(u) == randomG.degree(u))
Edge Switching Markov Chain [“The markov chain simulation method for generating connected power law random graphs”, Mihail and Zegura] perturbs simple directed or undirected graphs while preserving the node degrees. In each step, two edges are selected uniformly at random, and two endpoints exchanged. Swaps that introduce multi-edges or self-loops are rejected without replacement – this is necessary to allow uniform sampling [see “Switching edges to randomize networks: what goes wrong and how to fix it”, Carstens and Horadam].
In general, simple edge switching does not yield a uniform distribution on simple directed graphs because the orientation of a directed triangles cannot be changed. Using DegreePreservingShuffle as a preprocessing step overcomes this limitation. For convenience this is done by default for any graph to heuristically accelerate mixing.
[19]:
# Generate graph
G = nk.generators.ErdosRenyiGenerator(200, 0.2).generate()
[20]:
# Initialze algorithm
es = nk.randomization.EdgeSwitching(G)
[21]:
# Run algorithm
es.run()
[22]:
randomG = es.getGraph()
[23]:
# Verify
for u in range(G.upperNodeIdBound()):
assert(G.degree(u) == randomG.degree(u))
We offer an in-place variant of nk.randomization.EdgeSwitching
which is more efficient if the original graph is not needed anymore. The wrapper also takes ownership of the graph, so it’s safe to let the original graph name go out of scope.
In the following we pick up the previous Local Clustering Coefficient example. This time, however, we are not interested in the average value, but rather in the progress the made as we carry out an increasing number of Markov Chain steps.
[24]:
G = nk.generators.HyperbolicGenerator(1000, 10).generate()
alg = nk.randomization.EdgeSwitchingInPlace(G, 0.1)
for i in range(10):
if i > 0:
# do a few switches
alg.run() # this will update G directly
score = sum(nk.centrality.LocalClusteringCoefficient(G).run().scores()) / G.numberOfNodes()
print("After % 5d switches the llc score is %.3f" % (500 * i, score))
After 0 switches the llc score is 0.789
After 500 switches the llc score is 0.480
After 1000 switches the llc score is 0.313
After 1500 switches the llc score is 0.206
After 2000 switches the llc score is 0.146
After 2500 switches the llc score is 0.112
After 3000 switches the llc score is 0.090
After 3500 switches the llc score is 0.078
After 4000 switches the llc score is 0.074
After 4500 switches the llc score is 0.070
[ ]: