# NetworKit Sparsification Tutorial¶

This notebook covers the NetworKit sparsification module, which provides algorithms to compute edge scores and to sparsify an input graph.

Sparsification algorithms rely on edge scores, therefore graph edges must be indexed. Call the indexEdges() function to do so.

Every sparsification algorithm computing edge scores in NetworKit provides a scores() function that returns the edge attributes maximum parameter value such that the edge is contained in the sparsified graph.

The getSparsifiedGraph(G, parameter, attribute) or getSparsifiedGraphOfSize(G, edgeRatio, attribute) functions return the sparsified graph. parameter determines the degree of sparsification, while edgeRatio is the target edge ratio of the specified graph. attribute is an optional parameter representing a previously calculated edge attribute.

[1]:

import networkit as nk

[2]:

G = nk.readGraph("../input/jazz.graph", nk.Format.METIS)
G.indexEdges()
G.numberOfNodes(), G.numberOfEdges()

[2]:

(198, 2742)


All sparsification algorithms need an edgeRatio parameter. We use the same edgeRatio in all our examples.

[3]:

targetRatio = 0.2


## Forest Fire¶

The Forest Fire sparsifier implements a variant of the Forest Fire sparsification approach that is based on random walks.

### Edge Scores¶

The ForestFireScore(G, pf, tebr) <>__ constructor expects as inputs a graph, the probability pf that the neighbor nodes will burn as well, and the target burn ratio which states that forest fire will burn until tebr * m edges have been burnt (where m is the number of edges of G).

[4]:

# Initialize the algorithm
ffs = nk.sparsification.ForestFireScore(G, 0.6, 5.0)

# Run
ffs.run()

# Get edge scores
attributes = ffs.scores()
for attribute in attributes[:5]:
print("{:.3f}".format(attribute))

0.135
0.216
0.135
0.081
0.108


### Sparsification¶

The ForestFireSparsifier(burnProbability, targetBurntRatio) <>__ constructor expects as inputs the probability burnProbability that the neighbor nodes will burn as well, and the target burn ratio which states that forest fire will burn until targetBurntRatio * m edges have been burnt.

[5]:

# Initialize the algorithm
fireSparsifier = nk.sparsification.ForestFireSparsifier(0.6, 5.0)

# Get sparsified graph
fireGraph = fireSparsifier.getSparsifiedGraphOfSize(G, targetRatio)
G.numberOfEdges(), fireGraph.numberOfEdges()

[5]:

(2742, 512)


## Global Threshold Filter¶

The Global Threshold Filter calculates a sparsified graph by filtering globally using a constant threshold value and a given edge attribute.

The GlobalThresholdFilter(G, attribute, e, above) constructor expects as inputs a graph, a list of edge attributes, a threshold value e and a boolean value above. If above is set to True, all edges with an attribute value greater than or equal e will be kept in the sparsified graph. The calculate method returns the sparsified graph.

### Sparsification¶

[6]:

# Initialize the algorithm
gtf = nk.sparsification.GlobalThresholdFilter(G, attributes, 0.2, False)

# Run
newG = gtf.calculate()
G.numberOfEdges(), newG.numberOfEdges()

[6]:

(2742, 2215)


## Local Degree¶

The local degree sparsification strategy is based on the idea of hub nodes. For each edge of the graph, it determines the maximum parameter value such that the edge is still contained in the sparsified graph.

### Edge Scores¶

The LocalDegreeScore(G) constructor expects a graph as input.

[7]:

# Initialize the algorithm
lds = nk.sparsification.LocalDegreeScore(G)

# Run
lds.run()

# Get edge scores
ldsScores = lds.scores()
for score in ldsScores[:5]:
print("{:.3f}".format(score))

1.000
0.061
0.579
0.779
0.171


### Sparsification¶

[8]:

# Initialize the algorithm
localDegSparsifier = nk.sparsification.LocalDegreeSparsifier()

# Get sparsified graph
localDegGraph = localDegSparsifier.getSparsifiedGraphOfSize(G, targetRatio)
G.numberOfEdges(), localDegGraph.numberOfEdges()

[8]:

(2742, 547)


## Local Similarity¶

### Edge Scores¶

The LocalSimilarityScore(G, triangles) constructor expects a graph and previously calculated edge triangle counts of the graph.

The edge triangles can be computed using the TriangleEdgeScore(G) algorithm.

[9]:

# Compute triangles in G
e_triangles = nk.sparsification.TriangleEdgeScore(G)
e_triangles.run()
triangles = e_triangles.scores()

[10]:

# Initialize the algorithm
lss = nk.sparsification.LocalSimilarityScore(G, triangles)

# Run
lss.run()

# Get edge scores
scores = lss.scores()
for score in scores[:5]:
print("{:.3f}".format(score))

0.159
1.000
0.512
0.061
0.379


### Sparsification¶

[11]:

# Initialize the algorithm
similaritySparsifier = nk.sparsification.LocalSimilaritySparsifier()

# Get sparsified graph
similarityGraph = similaritySparsifier.getSparsifiedGraphOfSize(G, targetRatio)
G.numberOfEdges(), similarityGraph.numberOfEdges()

[11]:

(2742, 547)


## Random Edge Score¶

This strategy assigns to each edge a random value in [0,1].

### Edge Scores¶

The RandomEdgeScore(G) constructor expects a graph as input.

[12]:

# Initialize
res = nk.sparsification.RandomEdgeScore(G)

# Run
res.run()

# Get edge scores
randomEdgeScores = res.scores()
for score in randomEdgeScores[:5]:
print("{:.3f}".format(score))

0.706
0.474
0.479
0.016
0.756


### Sparsification¶

[13]:

# Initialize the algorithm
randomEdgeSparsifier = nk.sparsification.RandomEdgeSparsifier()

# Get sparsified graph
randomGraph = randomEdgeSparsifier.getSparsifiedGraphOfSize(G, targetRatio)
G.numberOfEdges(), randomGraph.numberOfEdges()

[13]:

(2742, 561)


## Random Node Edge Score¶

This attributizer returns edge attributes where each value is selected uniformly at random from [0,1].

### Edge Scores¶

The RandomNodeEdgeScore(G) constructor expects a graph as input.

[14]:

# Initialize
rn = nk.sparsification.RandomNodeEdgeScore(G)

# Run
rn.run()

# Get edge scores
randomNodeScores = rn.scores()
for score in randomNodeScores[:5]:
print("{:.3f}".format(score))

0.441
0.129
0.619
0.701
0.482


### Sparsification¶

[15]:

# Initialize the algorithm
randomNodeEdgeSparsifier = nk.sparsification.RandomNodeEdgeSparsifier()

# Get sparsified graph
randomNodeGraph = randomNodeEdgeSparsifier.getSparsifiedGraphOfSize(G, targetRatio)
G.numberOfEdges(), randomNodeGraph.numberOfEdges()

[15]:

(2742, 548)


## SCAN Structural Similarity Score¶

This algorithm is a Structural Clustering Algorithm for Networks (SCAN) whose goal is to find clusters, hubs, and outliers in large networks.

### Edge Scores¶

The SCANStructuralSimilarityScore(G, triangles) constructor expects as inputs a graph and previously calculated edge triangle counts of the graph.

[16]:

# Initialize the algorithm
scan = nk.sparsification.SCANStructuralSimilarityScore(G, triangles)

# Run
scan.run()

# Get edge scores
scanScores = scan.scores()
for score in scanScores[:5]:
print("{:.3f}".format(score))

0.566
0.609
0.670
0.348
0.490


### Sparsification¶

[17]:

# Initialize
scanSparsifier = nk.sparsification.SCANSparsifier()

# Get sparsified graph
scanGraph = scanSparsifier.getSparsifiedGraphOfSize(G, targetRatio)
G.numberOfEdges(), scanGraph.numberOfEdges()

[17]:

(2742, 548)


## Simmelian Overlap Score¶

This is an implementation of the parametric variant of Simmelian Backbones. It calculates for each edge the minimum parameter value such that the edge is still contained in the sparsified graph.

### Edge Scores¶

The SimmelianOverlapScore(G, triangles, maxRank) constructor expects as inputs a graph, triangles and the maximum rank that is considered for overlap calculation.

[18]:

# Initialize the algorithm
sos = nk.sparsification.SimmelianOverlapScore(G, triangles, 5)

# Run
sos.run()

# Get edge scores
sosScores = sos.scores()
for score in sosScores[:5]:
print("{:.3f}".format(score))

3.000
2.000
2.000
2.000
0.000


### Sparsification¶

[19]:

# Initialize the algorithm
simmelianSparsifier = nk.sparsification.SimmelianSparsifierNonParametric()

# Get sparsified graph
simmelieanGraph = simmelianSparsifier.getSparsifiedGraphOfSize(G, targetRatio)
G.numberOfEdges(), simmelieanGraph.numberOfEdges()

[19]:

(2742, 548)