Link prediction is concerned with estimating the probability of the existence of edges between nodes in a graph. The `linkprediction`

module in NetworKit provides sampling algorithms as well link prediction algorithms.

This notebook introduces a several link prediction algorithms available in NetworKit. It shows how to calculate link prediction measures, and how to use the sampling algorithms in combination with link prediction algorithms.

```
[1]:
```

```
import networkit as nk
```

The Adamic/Adar Index predicts links in a graph according to the amount of shared links between two nodes. The index sums up the reciprocals of the logarithm of the degree of all common neighbors between two nodes `u`

and `v`

.

The constructor, AdamicAdarIndex(G) expects a graph as input. The `run(u, v)`

method takes a pair of nodes `(u, v)`

as input and returns the Adamic/Adar Index of the given pair of nodes.

```
[2]:
```

```
# Read graph
G = nk.graphio.readGraph("../input/karate.graph", nk.Format.METIS)
# Initialize algorithm
aai = nk.linkprediction.AdamicAdarIndex(G)
# Get Adamic/Adar Index of two nodes
aai.run(14, 32)
```

```
[2]:
```

```
0.35295612386476116
```

The Algebraic Distance Index assigns a distance value to pairs of nodes according to their structural closeness in the graph.

The constructor AlgebraicDistanceIndex(G, numberSystems, numberIterations, omega=0.5, norm= 2) expects as inputs a graph, the number of systems to use for algebraic iteration, and the number of iterations in each system. `omega`

is the over-relaxation parameter and `norm`

is the norm factor of the extended algebraic distance. Maximum norm is realized by
setting `norm`

to 0.

After initialization, call the `preprocess()`

method. Afterwards, call the `run`

method: it takes a pair of nodes `(u, v)`

as input and returns the Algebraic Distance Index of the given pair of nodes.

```
[3]:
```

```
# Initialize the algorithm
adi = nk.linkprediction.AlgebraicDistanceIndex(G, 30, 200)
adi.preprocess()
# Get the algebraic distance index of two nodes
adi.run(1, 32)
```

```
[3]:
```

```
3.803456826652907e-07
```

The Common Neighbors Index calculates the number of common neighbors of a pair of nodes in a graph.

The constructor CommonNeighborsIndex(G), expects a graph as input. The `run(u, v)`

method takes as input a pair of nodes `(u, v)`

and returns the number of common neighbors between `u`

and `v`

.

```
[4]:
```

```
# Initialize the algorithm
cni = nk.linkprediction.CommonNeighborsIndex(G)
# Calculate common neighbors between two nodes
cni.run(14, 15)
```

```
[4]:
```

```
2.0
```

The Neighbors Measure Index returns the number of connections between neighbors of the given nodes `u`

and `v`

.

The constructor NeighborsMeasureIndex(G) expects a graph as input. The `run(u, v)`

takes a pair of nodes `(u, v)`

as input and returns the neighbors measure index between `u`

and `v`

.

```
[5]:
```

```
# Initialize the algorithm
nmi = nk.linkprediction.NeighborsMeasureIndex(G)
# Calculate the neighbors measure index between two nodes
nmi.run(14, 32)
```

```
[5]:
```

```
23.0
```

The Preferential Attachment Index suggests that the more connected a node is, the more likely it is to receive new links.

The constructor PreferentialAttachmentIndex(G) expects a graph as input. The `run(u, v)`

method takes a pair of nodes `(u, v)`

as input and returns the product of the cardinalities of the neighborhoods of nodes `u`

and `v`

.

```
[6]:
```

```
# Initialize the algorithm
pai = nk.linkprediction.PreferentialAttachmentIndex(G)
# Calculate the preferential attachment index between two nodes
pai.run(14, 32)
```

```
[6]:
```

```
24.0
```

The constructor ResourceAllocationIndex(G) expects a graph as input. The `run(u, v)`

method takes a pair of nodes `(u, v)`

as input and returns the Resource Allocation Index between `u`

and `v`

.

```
[7]:
```

```
# Initialize the algorithm
rai = nk.linkprediction.ResourceAllocationIndex(G)
# Calculate the resource allocation index between two nodes
rai.run(14, 32)
```

```
[7]:
```

```
0.058823529411764705
```

The Same Community Index determines whether two nodes `u`

and `v`

are in the same community.

The constructor SameCommunityIndex(G) expects a graph as input. The `run(u, v)`

method takes a pair of nodes `(u, v)`

as input and returns `1`

if `u`

and `v`

are in the same community, `0`

otherwise.

```
[8]:
```

```
# Initialize the algorithm
sni = nk.linkprediction.SameCommunityIndex(G)
# Compute the Same Community Index between two pairs of nodes
print(sni.run(14, 32))
print(sni.run(0, 32))
```

```
1.0
0.0
```

The Total Neighbors Index returns the number of nodes in the neighborhood-union of nodes `u`

and `v`

.

The constructor TotalNeighborsIndex(G) expects a graph as input. The `run(u, v)`

method takes a pair of nodes `(u, v)`

as input and returns the Total Neighbors Index between `u`

and `v`

.

```
[9]:
```

```
# Initialize the algorithm
tni = nk.linkprediction.TotalNeighborsIndex(G)
# Calculate the Total Neighbors Index between two nodes
tni.run(14, 32)
```

```
[9]:
```

```
13.0
```

This section shows how to use the training algorithms in combination with link prediction algorithms. In this example, we use the Random Link Sampler, the Missing Links Finder and the Katz Index algorithms.

The Katz index assigns a similarity score to a pair of nodes. This score is based on the weighted sum of the number of paths with length \(l\), where \(l\) is smaller than a given limit.

KatzIndex(G=None, maxPathLength=5, dampingValue=0.005) takes as inputs a graph (optional), the maximum length of paths to consider, and the damping value.

RandomLinkSampler(G, numLinks) provides methods to randomly sample a number of edges from a given graph. `numLinks`

is the number of edges the returned graph should have.

MissingLinksFinder(G) finds the missing edges in the given graph. The `findAtDistance(k)`

function returns all missing links in the graph. The absent links to find are narrowed down by providing a distance that the nodes of the missing links should have. For example in case of distance 2 only node-pairs that would close a triangle in the given graph get
returned.

```
[10]:
```

```
# Read graph
G = nk.graphio.readGraph("../input/jazz.graph", nk.Format.METIS)
# Sample graph
trainingGraph = nk.linkprediction.RandomLinkSampler.byPercentage(G, 0.7)
# Find missing links
missingLinks = nk.linkprediction.MissingLinksFinder(trainingGraph).findAtDistance(5)
# Run link prediticion
predictions = nk.linkprediction.KatzIndex(G).runOn(missingLinks)
# Print the first 5 predictions
for p in predictions[:5]:
print(p)
```

```
((0, 91), 1.3882187500000002e-07)
((1, 91), 1.884375e-09)
((14, 147), 3.1499999999999997e-09)
((15, 91), 4.403125e-09)
((18, 91), 2.6445000000000007e-07)
```