Link prediction is concerned with estimating the probability of the existence of edges between nodes in a graph. The linkprediction
module in NetworKit provides sampling algorithms as well link prediction algorithms.
This notebook introduces a several link prediction algorithms available in NetworKit. It shows how to calculate link prediction measures, and how to use the sampling algorithms in combination with link prediction algorithms.
[1]:
import networkit as nk
The Adamic/Adar Index predicts links in a graph according to the amount of shared links between two nodes. The index sums up the reciprocals of the logarithm of the degree of all common neighbors between two nodes u
and v
.
The constructor, AdamicAdarIndex(G) expects a graph as input. The run(u, v)
method takes a pair of nodes (u, v)
as input and returns the Adamic/Adar Index of the given pair of nodes.
[2]:
# Read graph
G = nk.graphio.readGraph("../input/karate.graph", nk.Format.METIS)
# Initialize algorithm
aai = nk.linkprediction.AdamicAdarIndex(G)
# Get Adamic/Adar Index of two nodes
aai.run(14, 32)
[2]:
0.35295612386476116
The Algebraic Distance Index assigns a distance value to pairs of nodes according to their structural closeness in the graph.
The constructor AlgebraicDistanceIndex(G, numberSystems, numberIterations, omega=0.5, norm= 2) expects as inputs a graph, the number of systems to use for algebraic iteration, and the number of iterations in each system. omega
is the over-relaxation parameter and norm
is the norm factor of the extended algebraic distance. Maximum norm is realized by
setting norm
to 0.
After initialization, call the preprocess()
method. Afterwards, call the run
method: it takes a pair of nodes (u, v)
as input and returns the Algebraic Distance Index of the given pair of nodes.
[3]:
# Initialize the algorithm
adi = nk.linkprediction.AlgebraicDistanceIndex(G, 30, 200)
adi.preprocess()
# Get the algebraic distance index of two nodes
adi.run(1, 32)
[3]:
4.364710511232489e-07
The Common Neighbors Index calculates the number of common neighbors of a pair of nodes in a graph.
The constructor CommonNeighborsIndex(G), expects a graph as input. The run(u, v)
method takes as input a pair of nodes (u, v)
and returns the number of common neighbors between u
and v
.
[4]:
# Initialize the algorithm
cni = nk.linkprediction.CommonNeighborsIndex(G)
# Calculate common neighbors between two nodes
cni.run(14, 15)
[4]:
2.0
The Neighbors Measure Index returns the number of connections between neighbors of the given nodes u
and v
.
The constructor NeighborsMeasureIndex(G) expects a graph as input. The run(u, v)
takes a pair of nodes (u, v)
as input and returns the neighbors measure index between u
and v
.
[5]:
# Initialize the algorithm
nmi = nk.linkprediction.NeighborsMeasureIndex(G)
# Calculate the neighbors measure index between two nodes
nmi.run(14, 32)
[5]:
23.0
The Preferential Attachment Index suggests that the more connected a node is, the more likely it is to receive new links.
The constructor PreferentialAttachmentIndex(G) expects a graph as input. The run(u, v)
method takes a pair of nodes (u, v)
as input and returns the product of the cardinalities of the neighborhoods of nodes u
and v
.
[6]:
# Initialize the algorithm
pai = nk.linkprediction.PreferentialAttachmentIndex(G)
# Calculate the preferential attachment index between two nodes
pai.run(14, 32)
[6]:
24.0
The constructor ResourceAllocationIndex(G) expects a graph as input. The run(u, v)
method takes a pair of nodes (u, v)
as input and returns the Resource Allocation Index between u
and v
.
[7]:
# Initialize the algorithm
rai = nk.linkprediction.ResourceAllocationIndex(G)
# Calculate the resource allocation index between two nodes
rai.run(14, 32)
[7]:
0.058823529411764705
The Same Community Index determines whether two nodes u
and v
are in the same community.
The constructor SameCommunityIndex(G) expects a graph as input. The run(u, v)
method takes a pair of nodes (u, v)
as input and returns 1
if u
and v
are in the same community, 0
otherwise.
[8]:
# Initialize the algorithm
sni = nk.linkprediction.SameCommunityIndex(G)
# Compute the Same Community Index between two pairs of nodes
print(sni.run(14, 32))
print(sni.run(0, 32))
1.0
0.0
The Total Neighbors Index returns the number of nodes in the neighborhood-union of nodes u
and v
.
The constructor TotalNeighborsIndex(G) expects a graph as input. The run(u, v)
method takes a pair of nodes (u, v)
as input and returns the Total Neighbors Index between u
and v
.
[9]:
# Initialize the algorithm
tni = nk.linkprediction.TotalNeighborsIndex(G)
# Calculate the Total Neighbors Index between two nodes
tni.run(14, 32)
[9]:
13.0
This section shows how to use the training algorithms in combination with link prediction algorithms. In this example, we use the Random Link Sampler, the Missing Links Finder and the Katz Index algorithms.
The Katz index assigns a similarity score to a pair of nodes. This score is based on the weighted sum of the number of paths with length \(l\), where \(l\) is smaller than a given limit.
KatzIndex(G=None, maxPathLength=5, dampingValue=0.005) takes as inputs a graph (optional), the maximum length of paths to consider, and the damping value.
RandomLinkSampler(G, numLinks) provides methods to randomly sample a number of edges from a given graph. numLinks
is the number of edges the returned graph should have.
MissingLinksFinder(G) finds the missing edges in the given graph. The findAtDistance(k)
function returns all missing links in the graph. The absent links to find are narrowed down by providing a distance that the nodes of the missing links should have. For example in case of distance 2 only node-pairs that would close a triangle in the given graph get
returned.
[10]:
# Read graph
G = nk.graphio.readGraph("../input/jazz.graph", nk.Format.METIS)
# Sample graph
trainingGraph = nk.linkprediction.RandomLinkSampler.byPercentage(G, 0.7)
# Find missing links
missingLinks = nk.linkprediction.MissingLinksFinder(trainingGraph).findAtDistance(5)
# Run link prediticion
predictions = nk.linkprediction.KatzIndex(G).runOn(missingLinks)
# Print the first 5 predictions
for p in predictions[:5]:
print(p)
((15, 101), 6.934375000000001e-09)
((15, 123), 8.175e-09)
((15, 144), 1.25625e-09)
((15, 172), 1.0700000000000002e-08)
((15, 180), 2.5376875000000005e-07)