The randomization module implements algorithms to perturb existing graphs. This is commonly used to obtain a null-model for hypothesis testing in network analysis (see below for an example). Intuitively one tests whether an observation in original also appears in *similar* graphs. By doing so one can quantify the statistical significance of the observation. The ensemble of *similar* networks is commonly chosen as the set of (simple) graphs that have the same degree sequence.

Edge Switching is a commonly used Markov Chain approach for this purpose. In every step it selects two edges uniformly at random and exchanges one endpoint of each. While no practical rigorous bounds on the mixing time are known, in literature typically 10 to 100 times the number of edges is used.

The Curveball algorithms creates a random sample with the same degree sequence, by repeatedly selecting two random nodes and randomly exchanging their neighbourhoods (this is referred to as a trade). This corresponds to a random walk on the space of all graphs in the ensemble. The more trades are carried out, the less correlated input and output are. While there a no practical lower bounds, one typically choose 10 to 100 times the number n of nodes.

GlobalCurveball is a related algorithm that is typically faster than Curveball. Also it’s implementation is more versatile (e.g., it support directed and undirected graphs). In each step it carries out n/2 trades involving all nodes. A typical choice of the number of GlobalTrades is 10 to 100. **We recommend the usage of GlobalCurveball over Curveball for performance reasons**.

```
[1]:
```

```
import networkit as nk
```

The Global Curveball class is an implementation of EM-GCB proposed in `Parallel and I/O-efficient Randomisation of Massive Networks using Global Curveball Trades", Carstens et al., ESA 2018`

. The algorithm perturbs an input graph, by iteratively randomizing the neighbourhoods of node pairs. For a large number of global trades this process is shown to produce a uniform sample from the set of all graphs with the same degree sequence as the input graph.

The `GlobalCurveball(G, number_of_global_rounds=20, allowSelfLoops=False, degreePreservingShufflePreprocessing=True)`

constructor expecs a graph, followed by the parameter `number_of_global_rounds`

which dictactes the number of global rounds to carry out. To accelerate the perturbation process we recommend to set degreePreservingShufflePreprocessing to true. This jump-starts the randomization process and yields faster convergence of the algorithm (see “Smaller Universes for Uniform Sampling
of 0,1-matrices with fixed row and column sums” * by Annabell Berger, Corrie Jacobien Carstens [https://arxiv.org/abs/1803.02624]). For directed simple graphs the algorithm will not yield a uniform sample unless the preprocessing is enabled.

```
[2]:
```

```
# Read graph
G = nk.graphio.readGraph("../input/karate.graph", nk.Format.METIS)
print(G.numberOfNodes(), G.numberOfEdges())
```

```
34 78
```

```
[3]:
```

```
# Initialize algorithm
globalCurve = nk.randomization.GlobalCurveball(G)
```

```
[4]:
```

```
# Run algorithm
globalCurve.run()
```

```
[4]:
```

```
<networkit.randomization.GlobalCurveball at 0x7ff5544cf050>
```

```
[5]:
```

```
# Get randomized graph
randomGlobalG = globalCurve.getGraph()
```

```
[6]:
```

```
# Verify
print(randomGlobalG.numberOfNodes(), randomGlobalG.numberOfEdges())
assert(randomGlobalG.numberOfNodes() == G.numberOfNodes())
for u in range (randomGlobalG.upperNodeIdBound()):
assert(randomGlobalG.degree(u) == G.degree(u))
```

```
34 78
```

Our hypothesis is that Hyperbolic Graphs (RHGs) have a high clustering. To test the hypothesis we first obtain an RHG and compute its local clustering coefficient. Then, we compute five random graphs with the same degree sequence and observe that their mean LCC score is much lower. This gives empirical support towards our hypothesis.

```
[7]:
```

```
def llcScore(G):
"""Compute average local clustering coefficient"""
return sum(nk.centrality.LocalClusteringCoefficient(G).run().scores()) / G.numberOfNodes()
# Hyperbolic graphs are known to have an above average clustering
G = nk.generators.HyperbolicGenerator(1000, 10).generate()
print("Avg. clustering of input: %.3f" % llcScore(G))
for i in range(5):
# Take 5 random graphs with the same degree sequence
sampledGraph = nk.randomization.GlobalCurveball(G).run().getGraph()
print("Avg. clustering of random sample: %.3f" % llcScore(sampledGraph))
```

```
Avg. clustering of input: 0.762
Avg. clustering of random sample: 0.033
Avg. clustering of random sample: 0.030
Avg. clustering of random sample: 0.033
Avg. clustering of random sample: 0.034
Avg. clustering of random sample: 0.032
```

The Curveball class in an implementation of IM-CB proposed in `Parallel and I/O-efficient Randomisation of Massive Networks using Global Curveball Trades", Carstens et al., ESA 2018`

. The algorithm perturbs an undirected and unweighted input graph, by iteratively randomizing the neighbourhoods of node pairs.

The `Curveball(G)`

constructor expects an undirected, unweighted graph. This class does not support the `run()`

method; it instead provides the `run(trades)`

method. The parameter `trades`

is a vector of pairs of nodes. The `run`

method can be called multiple times.

Following is an example:

```
[8]:
```

```
# Read graph
G = nk.graphio.readGraph("../input/celegans_metabolic.thrill", nk.Format.ThrillBinary)
print(G.numberOfNodes(), G.numberOfEdges())
```

```
453 2025
```

```
[9]:
```

```
# Initialize algorithm
curve = nk.randomization.Curveball(G)
```

The `CurveballUniformTradeGenerator(num_trades, num_nodes)`

class can be used to generate trades. `num_trades`

is the number of trades to generate while `num_nodes`

is the number of node indices to draw from. It generates a trade sequence consisting of `num_trades`

many single trades. Each trade contains two different node indices drawn uniformly at random from the interval [0, num_nodes).

```
[10]:
```

```
# Generate trade sequence
utg = nk.randomization.CurveballUniformTradeGenerator(G.numberOfNodes(), G.numberOfNodes())
trades = utg.generate()
```

```
[11]:
```

```
# Run algorithm
curve.run(trades)
```

```
[11]:
```

```
<networkit.randomization.Curveball at 0x7ff5544cf0f0>
```

```
[12]:
```

```
# Get randomized graph
randomCurveG = curve.getGraph()
```

```
[13]:
```

```
# Verify
print(randomCurveG.numberOfNodes(), randomCurveG.numberOfEdges())
assert(randomCurveG.numberOfNodes() == G.numberOfNodes())
for u in range (randomCurveG.upperNodeIdBound()):
assert(randomCurveG.degree(u) == G.degree(u))
```

```
453 2025
```

DegreePreservingShuffle is available as a standalone module. It will relabel nodes while keeping their degrees but wont change the topology of the graph. The constructor `DegreePreservingShuffle(G)`

expects a directed or undirected graph.

```
[14]:
```

```
# Generate graph
G = nk.generators.ErdosRenyiGenerator(200, 0.2).generate()
```

```
[15]:
```

```
# Initialze algorithm
dps = nk.randomization.DegreePreservingShuffle(G)
```

```
[16]:
```

```
# Run algorithm
dps.run()
```

```
[16]:
```

```
<networkit.randomization.DegreePreservingShuffle at 0x7ff5162e5190>
```

```
[17]:
```

```
randomG = dps.getGraph()
```

```
[18]:
```

```
# Verify
for u in range(G.upperNodeIdBound()):
assert(G.degree(u) == randomG.degree(u))
```

Edge Switching Markov Chain [“The markov chain simulation method for generating connected power law random graphs”, Mihail and Zegura] perturbs simple directed or undirected graphs while preserving the node degrees. In each step, two edges are selected uniformly at random, and two endpoints exchanged. Swaps that introduce multi-edges or self-loops are rejected *without* replacement – this is necessary to allow uniform sampling [see “Switching edges to randomize networks: what goes wrong and how
to fix it”, Carstens and Horadam].

In general, simple edge switching does not yield a uniform distribution on simple **directed** graphs because the orientation of a directed triangles cannot be changed. Using DegreePreservingShuffle as a preprocessing step overcomes this limitation. For convenience this is done by default for any graph to heuristically accelerate mixing.

```
[19]:
```

```
# Generate graph
G = nk.generators.ErdosRenyiGenerator(200, 0.2).generate()
```

```
[20]:
```

```
# Initialze algorithm
es = nk.randomization.EdgeSwitching(G)
```

```
[21]:
```

```
# Run algorithm
es.run()
```

```
[22]:
```

```
randomG = es.getGraph()
```

```
[23]:
```

```
# Verify
for u in range(G.upperNodeIdBound()):
assert(G.degree(u) == randomG.degree(u))
```

We offer an in-place variant of `nk.randomization.EdgeSwitching`

which is more efficient if the original graph is not needed anymore. The wrapper also takes ownership of the graph, so it’s safe to let the original graph name go out of scope.

In the following we pick up the previous Local Clustering Coefficient example. This time, however, we are not interested in the average value, but rather in the progress the made as we carry out an increasing number of Markov Chain steps.

```
[24]:
```

```
G = nk.generators.HyperbolicGenerator(1000, 10).generate()
alg = nk.randomization.EdgeSwitchingInPlace(G, 0.1)
for i in range(10):
if i > 0:
# do a few switches
alg.run() # this will update G directly
score = sum(nk.centrality.LocalClusteringCoefficient(G).run().scores()) / G.numberOfNodes()
print("After % 5d switches the llc score is %.3f" % (500 * i, score))
```

```
After 0 switches the llc score is 0.780
After 500 switches the llc score is 0.460
After 1000 switches the llc score is 0.281
After 1500 switches the llc score is 0.176
After 2000 switches the llc score is 0.119
After 2500 switches the llc score is 0.084
After 3000 switches the llc score is 0.061
After 3500 switches the llc score is 0.049
After 4000 switches the llc score is 0.041
After 4500 switches the llc score is 0.035
```

```
[ ]:
```

```
```