This notebook covers the NetworKit sparsification module, which provides algorithms to compute edge scores and to sparsify an input graph.
Sparsification algorithms rely on edge scores, therefore graph edges must be indexed. Call the indexEdges() function to do so.
Every sparsification algorithm computing edge scores in NetworKit provides a scores()
function that returns the edge attributes maximum parameter value such that the edge is contained in the sparsified graph.
The getSparsifiedGraph(G, parameter, attribute) or getSparsifiedGraphOfSize(G, edgeRatio, attribute) functions return the sparsified graph. parameter
determines the degree of sparsification,
while edgeRatio
is the target edge ratio of the specified graph. attribute
is an optional parameter representing a previously calculated edge attribute.
[1]:
import networkit as nk
[2]:
G = nk.readGraph("../input/jazz.graph", nk.Format.METIS)
G.indexEdges()
G.numberOfNodes(), G.numberOfEdges()
[2]:
(198, 2742)
All sparsification algorithms need an edgeRatio
parameter. We use the same edgeRatio
in all our examples.
[3]:
targetRatio = 0.2
The Forest Fire sparsifier implements a variant of the Forest Fire sparsification approach that is based on random walks.
The `ForestFireScore(G, pf, tebr) <>`__ constructor expects as inputs a graph, the probability pf
that the neighbor nodes will burn as well, and the target burn ratio which states that forest fire will burn until tebr * m
edges have been burnt (where m
is the number of edges of G
).
[4]:
# Initialize the algorithm
ffs = nk.sparsification.ForestFireScore(G, 0.6, 5.0)
# Run
ffs.run()
# Get edge scores
attributes = ffs.scores()
for attribute in attributes[:5]:
print("{:.3f}".format(attribute))
0.070
0.395
0.093
0.070
0.186
The `ForestFireSparsifier(burnProbability, targetBurntRatio) <>`__ constructor expects as inputs the probability burnProbability
that the neighbor nodes will burn as well, and the target burn ratio which states that forest fire will burn until targetBurntRatio * m
edges have been burnt.
[5]:
# Initialize the algorithm
fireSparsifier = nk.sparsification.ForestFireSparsifier(0.6, 5.0)
# Get sparsified graph
fireGraph = fireSparsifier.getSparsifiedGraphOfSize(G, targetRatio)
G.numberOfEdges(), fireGraph.numberOfEdges()
[5]:
(2742, 499)
The Global Threshold Filter calculates a sparsified graph by filtering globally using a constant threshold value and a given edge attribute.
The GlobalThresholdFilter(G, attribute, e, above) constructor expects as inputs a graph, a list of edge attributes, a threshold value e
and a boolean value above
. If above
is set to True
, all edges with an attribute value greater than or equal e
will be kept in the sparsified graph. The calculate
method returns the sparsified graph.
[6]:
# Initialize the algorithm
gtf = nk.sparsification.GlobalThresholdFilter(G, attributes, 0.2, False)
# Run
newG = gtf.calculate()
G.numberOfEdges(), newG.numberOfEdges()
[6]:
(2742, 2343)
The local degree sparsification strategy is based on the idea of hub nodes. For each edge of the graph, it determines the maximum parameter value such that the edge is still contained in the sparsified graph.
The LocalDegreeScore(G) constructor expects a graph as input.
[7]:
# Initialize the algorithm
lds = nk.sparsification.LocalDegreeScore(G)
# Run
lds.run()
# Get edge scores
ldsScores = lds.scores()
for score in ldsScores[:5]:
print("{:.3f}".format(score))
1.000
0.061
0.579
0.779
0.171
[8]:
# Initialize the algorithm
localDegSparsifier = nk.sparsification.LocalDegreeSparsifier()
# Get sparsified graph
localDegGraph = localDegSparsifier.getSparsifiedGraphOfSize(G, targetRatio)
G.numberOfEdges(), localDegGraph.numberOfEdges()
[8]:
(2742, 547)
The LocalSimilarityScore(G, triangles) constructor expects a graph and previously calculated edge triangle counts of the graph.
The edge triangles can be computed using the TriangleEdgeScore(G) algorithm.
[9]:
# Compute triangles in G
e_triangles = nk.sparsification.TriangleEdgeScore(G)
e_triangles.run()
triangles = e_triangles.scores()
[10]:
# Initialize the algorithm
lss = nk.sparsification.LocalSimilarityScore(G, triangles)
# Run
lss.run()
# Get edge scores
scores = lss.scores()
for score in scores[:5]:
print("{:.3f}".format(score))
0.159
1.000
0.512
0.061
0.379
[11]:
# Initialize the algorithm
similaritySparsifier = nk.sparsification.LocalSimilaritySparsifier()
# Get sparsified graph
similarityGraph = similaritySparsifier.getSparsifiedGraphOfSize(G, targetRatio)
G.numberOfEdges(), similarityGraph.numberOfEdges()
[11]:
(2742, 547)
This strategy assigns to each edge a random value in [0,1].
The RandomEdgeScore(G) constructor expects a graph as input.
[12]:
# Initialize
res = nk.sparsification.RandomEdgeScore(G)
# Run
res.run()
# Get edge scores
randomEdgeScores = res.scores()
for score in randomEdgeScores[:5]:
print("{:.3f}".format(score))
0.662
0.805
0.365
0.547
0.416
[13]:
# Initialize the algorithm
randomEdgeSparsifier = nk.sparsification.RandomEdgeSparsifier()
# Get sparsified graph
randomGraph = randomEdgeSparsifier.getSparsifiedGraphOfSize(G, targetRatio)
G.numberOfEdges(), randomGraph.numberOfEdges()
[13]:
(2742, 546)
This attributizer returns edge attributes where each value is selected uniformly at random from [0,1].
The RandomNodeEdgeScore(G) constructor expects a graph as input.
[14]:
# Initialize
rn = nk.sparsification.RandomNodeEdgeScore(G)
# Run
rn.run()
# Get edge scores
randomNodeScores = rn.scores()
for score in randomNodeScores[:5]:
print("{:.3f}".format(score))
0.234
0.491
0.655
0.969
0.231
[15]:
# Initialize the algorithm
randomNodeEdgeSparsifier = nk.sparsification.RandomNodeEdgeSparsifier()
# Get sparsified graph
randomNodeGraph = randomNodeEdgeSparsifier.getSparsifiedGraphOfSize(G, targetRatio)
G.numberOfEdges(), randomNodeGraph.numberOfEdges()
[15]:
(2742, 548)
This algorithm is a Structural Clustering Algorithm for Networks (SCAN) whose goal is to find clusters, hubs, and outliers in large networks.
The SCANStructuralSimilarityScore(G, triangles) constructor expects as inputs a graph and previously calculated edge triangle counts of the graph.
[16]:
# Initialize the algorithm
scan = nk.sparsification.SCANStructuralSimilarityScore(G, triangles)
# Run
scan.run()
# Get edge scores
scanScores = scan.scores()
for score in scanScores[:5]:
print("{:.3f}".format(score))
0.566
0.609
0.670
0.348
0.490
[17]:
# Initialize
scanSparsifier = nk.sparsification.SCANSparsifier()
# Get sparsified graph
scanGraph = scanSparsifier.getSparsifiedGraphOfSize(G, targetRatio)
G.numberOfEdges(), scanGraph.numberOfEdges()
[17]:
(2742, 548)
This is an implementation of the parametric variant of Simmelian Backbones. It calculates for each edge the minimum parameter value such that the edge is still contained in the sparsified graph.
The SimmelianOverlapScore(G, triangles, maxRank) constructor expects as inputs a graph, triangles and the maximum rank that is considered for overlap calculation.
[18]:
# Initialize the algorithm
sos = nk.sparsification.SimmelianOverlapScore(G, triangles, 5)
# Run
sos.run()
# Get edge scores
sosScores = sos.scores()
for score in sosScores[:5]:
print("{:.3f}".format(score))
3.000
2.000
2.000
2.000
0.000
[19]:
# Initialize the algorithm
simmelianSparsifier = nk.sparsification.SimmelianSparsifierNonParametric()
# Get sparsified graph
simmelieanGraph = simmelianSparsifier.getSparsifiedGraphOfSize(G, targetRatio)
G.numberOfEdges(), simmelieanGraph.numberOfEdges()
[19]:
(2742, 548)