NetworKit Sparsification Tutorial

This notebook covers the NetworKit sparsification module, which provides algorithms to compute edge scores and to sparsify an input graph.

Sparsification algorithms rely on edge scores, therefore graph edges must be indexed. Call the indexEdges() function to do so.

Every sparsification algorithm computing edge scores in NetworKit provides a scores() function that returns the edge attributes maximum parameter value such that the edge is contained in the sparsified graph.

The getSparsifiedGraph(G, parameter, attribute) or getSparsifiedGraphOfSize(G, edgeRatio, attribute) functions return the sparsified graph. parameter determines the degree of sparsification, while edgeRatio is the target edge ratio of the specified graph. attribute is an optional parameter representing a previously calculated edge attribute.

[1]:
import networkit as nk
[2]:
G = nk.readGraph("../input/jazz.graph", nk.Format.METIS)
G.indexEdges()
G.numberOfNodes(), G.numberOfEdges()
[2]:
(198, 2742)

All sparsification algorithms need an edgeRatio parameter. We use the same edgeRatio in all our examples.

[3]:
targetRatio = 0.2

Forest Fire

The Forest Fire sparsifier implements a variant of the Forest Fire sparsification approach that is based on random walks.

Edge Scores

The `ForestFireScore(G, pf, tebr) <>`__ constructor expects as inputs a graph, the probability pf that the neighbor nodes will burn as well, and the target burn ratio which states that forest fire will burn until tebr * m edges have been burnt (where m is the number of edges of G).

[4]:
# Initialize the algorithm
ffs = nk.sparsification.ForestFireScore(G, 0.6, 5.0)

# Run
ffs.run()

# Get edge scores
attributes = ffs.scores()
for attribute in attributes[:5]:
    print("{:.3f}".format(attribute))
0.054
0.324
0.081
0.108
0.297

Sparsification

The `ForestFireSparsifier(burnProbability, targetBurntRatio) <>`__ constructor expects as inputs the probability burnProbability that the neighbor nodes will burn as well, and the target burn ratio which states that forest fire will burn until targetBurntRatio * m edges have been burnt.

[5]:
# Initialize the algorithm
fireSparsifier = nk.sparsification.ForestFireSparsifier(0.6, 5.0)

# Get sparsified graph
fireGraph = fireSparsifier.getSparsifiedGraphOfSize(G, targetRatio)
G.numberOfEdges(), fireGraph.numberOfEdges()
[5]:
(2742, 519)

Global Threshold Filter

The Global Threshold Filter calculates a sparsified graph by filtering globally using a constant threshold value and a given edge attribute.

The GlobalThresholdFilter(G, attribute, e, above) constructor expects as inputs a graph, a list of edge attributes, a threshold value e and a boolean value above. If above is set to True, all edges with an attribute value greater than or equal e will be kept in the sparsified graph. The calculate method returns the sparsified graph.

Sparsification

[6]:
# Initialize the algorithm
gtf = nk.sparsification.GlobalThresholdFilter(G, attributes, 0.2, False)

# Run
newG = gtf.calculate()
G.numberOfEdges(), newG.numberOfEdges()
[6]:
(2742, 2212)

Local Degree

The local degree sparsification strategy is based on the idea of hub nodes. For each edge of the graph, it determines the maximum parameter value such that the edge is still contained in the sparsified graph.

Edge Scores

The LocalDegreeScore(G) constructor expects a graph as input.

[7]:
# Initialize the algorithm
lds = nk.sparsification.LocalDegreeScore(G)

# Run
lds.run()

# Get edge scores
ldsScores = lds.scores()
for score in ldsScores[:5]:
    print("{:.3f}".format(score))
1.000
0.061
0.579
0.779
0.171

Sparsification

[8]:
# Initialize the algorithm
localDegSparsifier = nk.sparsification.LocalDegreeSparsifier()

# Get sparsified graph
localDegGraph = localDegSparsifier.getSparsifiedGraphOfSize(G, targetRatio)
G.numberOfEdges(), localDegGraph.numberOfEdges()
[8]:
(2742, 547)

Local Similarity

Edge Scores

The LocalSimilarityScore(G, triangles) constructor expects a graph and previously calculated edge triangle counts of the graph.

The edge triangles can be computed using the TriangleEdgeScore(G) algorithm.

[9]:
# Compute triangles in G
e_triangles = nk.sparsification.TriangleEdgeScore(G)
e_triangles.run()
triangles = e_triangles.scores()
[10]:
# Initialize the algorithm
lss = nk.sparsification.LocalSimilarityScore(G, triangles)

# Run
lss.run()

# Get edge scores
scores = lss.scores()
for score in scores[:5]:
    print("{:.3f}".format(score))
0.159
1.000
0.512
0.061
0.379

Sparsification

[11]:
# Initialize the algorithm
similaritySparsifier = nk.sparsification.LocalSimilaritySparsifier()

# Get sparsified graph
similarityGraph = similaritySparsifier.getSparsifiedGraphOfSize(G, targetRatio)
G.numberOfEdges(), similarityGraph.numberOfEdges()
[11]:
(2742, 547)

Random Edge Score

This strategy assigns to each edge a random value in [0,1].

Edge Scores

The RandomEdgeScore(G) constructor expects a graph as input.

[12]:
# Initialize
res = nk.sparsification.RandomEdgeScore(G)

# Run
res.run()

# Get edge scores
randomEdgeScores = res.scores()
for score in randomEdgeScores[:5]:
    print("{:.3f}".format(score))
0.973
0.015
0.958
0.733
0.530

Sparsification

[13]:
# Initialize the algorithm
randomEdgeSparsifier = nk.sparsification.RandomEdgeSparsifier()

# Get sparsified graph
randomGraph = randomEdgeSparsifier.getSparsifiedGraphOfSize(G, targetRatio)
G.numberOfEdges(), randomGraph.numberOfEdges()
[13]:
(2742, 583)

Random Node Edge Score

This attributizer returns edge attributes where each value is selected uniformly at random from [0,1].

Edge Scores

The RandomNodeEdgeScore(G) constructor expects a graph as input.

[14]:
# Initialize
rn = nk.sparsification.RandomNodeEdgeScore(G)

# Run
rn.run()

# Get edge scores
randomNodeScores = rn.scores()
for score in randomNodeScores[:5]:
    print("{:.3f}".format(score))
0.434
0.073
0.922
0.865
0.055

Sparsification

[15]:
# Initialize the algorithm
randomNodeEdgeSparsifier = nk.sparsification.RandomNodeEdgeSparsifier()

# Get sparsified graph
randomNodeGraph = randomNodeEdgeSparsifier.getSparsifiedGraphOfSize(G, targetRatio)
G.numberOfEdges(), randomNodeGraph.numberOfEdges()
[15]:
(2742, 548)

SCAN Structural Similarity Score

This algorithm is a Structural Clustering Algorithm for Networks (SCAN) whose goal is to find clusters, hubs, and outliers in large networks.

Edge Scores

The SCANStructuralSimilarityScore(G, triangles) constructor expects as inputs a graph and previously calculated edge triangle counts of the graph.

[16]:
# Initialize the algorithm
scan = nk.sparsification.SCANStructuralSimilarityScore(G, triangles)

# Run
scan.run()

# Get edge scores
scanScores = scan.scores()
for score in scanScores[:5]:
    print("{:.3f}".format(score))
0.566
0.609
0.670
0.348
0.490

Sparsification

[17]:
# Initialize
scanSparsifier = nk.sparsification.SCANSparsifier()

# Get sparsified graph
scanGraph = scanSparsifier.getSparsifiedGraphOfSize(G, targetRatio)
G.numberOfEdges(), scanGraph.numberOfEdges()
[17]:
(2742, 548)

Simmelian Overlap Score

This is an implementation of the parametric variant of Simmelian Backbones. It calculates for each edge the minimum parameter value such that the edge is still contained in the sparsified graph.

Edge Scores

The SimmelianOverlapScore(G, triangles, maxRank) constructor expects as inputs a graph, triangles and the maximum rank that is considered for overlap calculation.

[18]:
# Initialize the algorithm
sos = nk.sparsification.SimmelianOverlapScore(G, triangles, 5)

# Run
sos.run()

# Get edge scores
sosScores = sos.scores()
for score in sosScores[:5]:
    print("{:.3f}".format(score))
3.000
2.000
2.000
2.000
0.000

Sparsification

[19]:
# Initialize the algorithm
simmelianSparsifier = nk.sparsification.SimmelianSparsifierNonParametric()

# Get sparsified graph
simmelieanGraph = simmelianSparsifier.getSparsifiedGraphOfSize(G, targetRatio)
G.numberOfEdges(), simmelieanGraph.numberOfEdges()
[19]:
(2742, 548)