NetworKit Graph Tutorial

In this notebook we will cover the main functionalities of networkit.Graph, the central class in NetworKit. The first step is to import NetworKit.

[1]:
import networkit as nk

We start by creating the core object, a networkit.Graph. The networkit.Graph constructor expects the number of nodes as an integer, a boolean value stating if the graph is weighted or not followed by another boolean value stating whether the graph is directed or not. The latter two are set to false by default. If the graph is unweighted, all edge weights are set to 1.0.

[2]:
G = nk.Graph(5)
print(G.numberOfNodes(), G.numberOfEdges())
print(G.isWeighted(), G.isDirected())
5 0
False False

G has 5 nodes, but no edges yet. Using the addEdge(node u, node v) method, we can add edges between the nodes.

[3]:
G.addEdge(1, 3)
G.addEdge(2, 4)
G.addEdge(1, 2)
G.addEdge(3, 4)
G.addEdge(2, 3)
G.addEdge(4, 0)
[3]:
True

Node IDs in NetworKit are integer indices that start at 0 through to G.upperNodeIdBound() - 1. Hence, for this graph G.addEdge(1,5) would an illegal operation as node 5 does not exist. The same goes for edge IDs. If we need to add an edge between node 0 and node 5, we first need to add a sixth node to the graph using G.addNode().

[4]:
G.addNode()
print(G.numberOfNodes())
6

Now we can add the new edge.

[5]:
G.addEdge(0,5)
[5]:
True

Using the method overview(G), we can take a look at the main properties of the graph we have created.

[6]:
nk.overview(G)
Network Properties:
nodes, edges                    6, 7
directed?                       False
weighted?                       False
isolated nodes                  0
self-loops                      0
density                         0.466667
clustering coefficient          0.444444
min/max/avg degree              1, 3, 2.333333
degree assortativity            0.353553
number of connected components  1
size of largest component       6 (100.00 %)

Now that we have created a graph we can start to play around with it. Say we want to remove the node with the node ID 2, so the third node. We can easily do so using Graph.removeNode(node u).

[7]:
G.removeNode(2)
[8]:
# 2 has been deleted
print(G.hasNode(2))
False

The node has been remove from the graph, however, the node IDs are not adjusted to the match the new number of nodes. Hence, if we want to restore the node we previously removed from G, we can do so using Graph.restoreNode(node u) using the original node ID.

[9]:
# Restore node with ID 2
G.restoreNode(2)

# Check if it is back in G
print(G.hasNode(2))
True

Note that in default mode NetworKit allows you to add an edge multiple times, most algorithms (and also io-functions) do not support it.

[10]:
G.addNode()
G.addEdge(0, 6)
print(G.numberOfEdges())
# NetworKit does not complain when inserting the same edge a second time
G.addEdge(0, 6)
print(G.numberOfEdges())
5
6

If wanted, this behavior can be changed. This increases the running time of adding an edge by the degree of the involved nodes.

[11]:
# Remove one of the multiple edges
G.removeEdge(0, 6)
print(G.numberOfEdges())
# The multi-edge is not added to the graph.
G.addEdge(0, 6, checkMultiEdge = True)
print(G.numberOfEdges())
5
5

NetworKit provides iterators that enable iterating over all nodes or edges in a simple manner. There are two kinds of iterators: one is based on ranges, the other one accepts callback a function. The easiest to use are the range-based iterators, they can be used in a simple for loop:

[12]:
# Iterate over the nodes of G
for u in G.iterNodes():
    print(u)
    if u > 4:
        print('...')
0
1
2
3
4
5
...
6
...
[13]:
# Iterate over the edges of G
for u, v in G.iterEdges():
    print(u, v)
    if u > 2:
        print('...')
0 4
0 5
0 6
1 3
3 4
...
[14]:
# Iterate over the edges of G1 and include weights
G1 = nk.graphtools.toWeighted(G)
G1.setWeight(0, 4, 2)
G1.setWeight(1, 3, 3)
for u, v, w in G1.iterEdgesWeights():
    print(u, v, w)
    if u > 2:
        print('...')
0 4 2.0
0 5 1.0
0 6 1.0
1 3 3.0
3 4 1.0
...

Callback-based iterators accept a callback function passed to the iterators as input parameter. Those functions can also be lambda functions. More information can be found in the NetworKit documentation here. Let’s start by using the forNodes iterator. It expects a callback function which accepts one parameter, i.e., a node. First, we define such a function.

[15]:
def nodeFunc(u):
    print("Node ", u, " passed to nodeFunc()")

We then pass nodeFunc to the iterator.

[16]:
G.forNodes(nodeFunc)
Node  0  passed to nodeFunc()
Node  1  passed to nodeFunc()
Node  2  passed to nodeFunc()
Node  3  passed to nodeFunc()
Node  4  passed to nodeFunc()
Node  5  passed to nodeFunc()
Node  6  passed to nodeFunc()
[17]:
G.forNodes(lambda u: print("Node ", u, " passed to lambda"))
Node  0  passed to lambda
Node  1  passed to lambda
Node  2  passed to lambda
Node  3  passed to lambda
Node  4  passed to lambda
Node  5  passed to lambda
Node  6  passed to lambda

Similarly, we can iterate over the edges of G:

[18]:
# First define callback function
# that accepts exactly 4 parameters:
def edgeFunc(u, v, weight, edgeId):
    print("Edge from {} to {} has weight {} and id {}".format(u, v, weight, edgeId))

We can now call the forEdges iterator, and pass edgeFunc to it.

[19]:
# Using iterator with callback function.
G.forEdges(edgeFunc)
Edge from 3 to 1 has weight 1.0 and id 18446744073709551615
Edge from 4 to 0 has weight 1.0 and id 18446744073709551615
Edge from 4 to 3 has weight 1.0 and id 18446744073709551615
Edge from 5 to 0 has weight 1.0 and id 18446744073709551615
Edge from 6 to 0 has weight 1.0 and id 18446744073709551615

Although we did not add any indexes to our edges, our edges all have indexes of 0. This is because NetworKit by default indexes all edges with 0. However, sometimes it makes sense to have indexed edges. If you decide to index the edges of your graph after creating it, you can use the Graph.indexEdges(bool force = False) method. The force parameter forces re-indexing of edges if they had already been indexed.

Since we did not index the edges of our graph initially, we can use the default value. Indexing the edges of G can then be done as simply as follows:

[20]:
G.indexEdges()

Sometimes you also need to iterate over specific edges, for example the ones connecting a node u to its neighbors. Using the forNodes iterator and the forEdgesOf iterator we can do so.

[21]:
G.forNodes(lambda u: G.forEdgesOf(u, edgeFunc))
Edge from 0 to 4 has weight 1.0 and id 1
Edge from 0 to 5 has weight 1.0 and id 3
Edge from 0 to 6 has weight 1.0 and id 4
Edge from 1 to 3 has weight 1.0 and id 0
Edge from 3 to 1 has weight 1.0 and id 0
Edge from 3 to 4 has weight 1.0 and id 2
Edge from 4 to 0 has weight 1.0 and id 1
Edge from 4 to 3 has weight 1.0 and id 2
Edge from 5 to 0 has weight 1.0 and id 3
Edge from 6 to 0 has weight 1.0 and id 4

Note, in an undirected graph, like we have here, the forEdgesOf iterator returns all edges of a node. When dealing with a directed graph only the out edges are returned. The rest of the edges can be accessed using the forInEdgesOf iterator.

Graph from Pandas, Scipy and Numpy data

In addition to manually filling the graph datastructure, it is also possible to create a graph based on given input data in COO (coordinate) format by using nk.GraphFromCoo(...). The parameter syntax is related to scipy.sparse.coo_array. We start by manually define row, col and data arrays (whereas A[row[k], col[k]] = data[k]) and use them as a constructed tuple in the form (data, (row, col)).

[22]:
import numpy as np
import scipy as sc
import pandas as pd

row = np.array([0, 1, 2])
col = np.array([1, 2, 0])
data = np.array([1.0, 1.0, 1.0])

GData = nk.GraphFromCoo((data,(row,col)))

For speedup purposes, it is possible to also pass the number of expected nodes as a parameter. Due to the tiny size of our example graph, the difference is very small in this case. However, for the majority of use cases, providing n results in a much faster creation of the graph.

[23]:
GData = nk.GraphFromCoo((data,(row,col)), n = 3)

In addition nk.GraphFromCoo supports the same parameter as nk.Graph (e.g. set the returning graph to be directed/weighted). If data is not given, each edge weight is assumed to be 1.0.

[24]:
GData = nk.GraphFromCoo((data,(row,col)), n = 3, directed = True, weighted = True)

It is also possible to create a graph from a scipy.sparse.coo_matrix.

[25]:
S = sc.sparse.coo_matrix((data, (row, col)), dtype = np.double)
GData = nk.GraphFromCoo(S, 3)

A common usecase involves data handling via pandas. Since the underlying data structure is easily transformed to numpy arrays, we can also create graphs from pandas DataFrames. In our example we define a set of people (Alice, Bob, Carol, Dan, Erin and Frank) and relationships between them. Each row in the DataFrame describes a (directed) relation from People_1 to People_2. For example Alice has a relationship to Carol (not necessarely the other way around, therefore directed).

[26]:
persons = pd.CategoricalDtype(categories=['Alice', 'Bob', 'Carol', 'Dan', 'Erin', 'Frank'], ordered=True)
relations = [('Alice', 'Carol'), ('Bob', 'Dan'), ('Dan', 'Erin'), ('Carol', 'Frank')]

friends_df = pd.DataFrame(relations, columns=['Person_1', 'Person_2']).astype(persons)
print(friends_df)

GData = nk.GraphFromCoo((friends_df['Person_1'].cat.codes.to_numpy(dtype=np.uint, copy = False), friends_df['Person_2'].cat.codes.to_numpy(dtype=np.uint, copy = False)), n = len(persons.categories), directed = True)

nk.overview(GData)
  Person_1 Person_2
0    Alice    Carol
1      Bob      Dan
2      Dan     Erin
3    Carol    Frank
Network Properties:
nodes, edges                    6, 4
directed?                       True
weighted?                       False
isolated nodes                  2
self-loops                      0
density                         0.133333
min/max/avg degree              0, 1, 0.666667
degree assortativity            nan
number of connected components  6
size of largest component       1 (16.67 %)

Node Attributes

It is possible to attach attributes to the nodes of a NetworKit graph with attachNodeAttribute. Attributes can be of type str, float, or int.

[27]:
# Create a new attribute named 'myAtt' of type 'str'
att = G.attachNodeAttribute("myAtt", str)

# Set attribute values
att[0] = "foo" # Attribute of node 0
att[1] = "bar" # Attribute of node 1

# Get attribute value
for u in G.iterNodes():
    try:
        print(f"Attribute of node {u} is {att[u]}")
    except ValueError:
        print(f"Node {u} has no attribute")
        break
Attribute of node 0 is foo
Attribute of node 1 is bar
Node 2 has no attribute

Edge Attributes

In the same way, it is also possible to add edge attributes to a NetworKit graph with attachEdgeAttribute. Attributes can be of type str, float, or int. Note that the edges of the graph have to be indexed.

It is possible to access the attributes both by edge index and by endpoints. Note: Access by edge index can be much slower compared to access by endpoints, therefore best use att[u, v] for access.

[28]:
# Call indexEdges once (all edges inserted afterwards will also get indexed)
G.indexEdges()

# Create a new attribute named 'myAtt' of type 'str'
att = G.attachEdgeAttribute("myAtt", str)

# Set attribute values
# Edges 1-3 and 4-0 were added earlier in the notebook
att[1,3] = "foo" # Attribute of edge 1-3
att[4,0] = "bar" # Attribute of node 4-0

# Get attribute value by edge endpoints (fast)
for u,v in G.iterEdges():
    try:
        print(f"Attribute of edge {u},{v} is {att[u,v]}")
    except ValueError:
        print(f"Edge {u},{v} has no attribute")

# Get attribute value by edge index (slow)
for u,v in G.iterEdges():
    try:
        edgeId = G.edgeId(u, v)
        print(f"Attribute of edge {u},{v} is {att[edgeId]}")
    except ValueError:
        print(f"Edge {u},{v} has no attribute")
Attribute of edge 0,4 is bar
Edge 0,5 has no attribute
Edge 0,6 has no attribute
Attribute of edge 1,3 is foo
Edge 3,4 has no attribute
Attribute of edge 0,4 is bar
Edge 0,5 has no attribute
Edge 0,6 has no attribute
Attribute of edge 1,3 is foo
Edge 3,4 has no attribute

GraphTools

networkit.graphtools implements some useful functions to get information or modify graphs. The following section shows some of the GraphTools functionalities.

toWeighted(G) takes an unweighted graph G as input, and returns a weighted copy of G. All the edge weights are set to a default value of 1.0.

[29]:
weightedG = nk.graphtools.toWeighted(G)
assert(weightedG.numberOfNodes() == G.numberOfNodes())
assert(weightedG.numberOfEdges() == G.numberOfEdges())
assert(weightedG.isWeighted())

toUnweighted(G) does the inverse of the one above: it takes a weighted graph G as input, and returns an unweighted copy of G as output.

randomNode(G) returns a node of G selected uniformly at random.

[30]:
nk.graphtools.randomNode(G)
[30]:
0

randomNeighbor(G, u) returns a random (out) neighbor of node u in the graph G.

[31]:
nk.graphtools.randomNeighbor(G, 0)
[31]:
5

randomEdge(G, uniformDistribution=False) returns a random edge of graph G. If uniformDistribution is set to True, the edge is selected uniformly at random.

[32]:
nk.graphtools.randomEdge(G, True)
[32]:
(4, 0)

Sometimes it makes sense to compact the graph, e.g., after deleting nodes. The method getCompactedGraph(G, nodeIdMap) does just that by designating continuous node ids. nodeIdMap maps each node id of graph G to their new ids.

First, we delete a node from G.

[33]:
G.removeNode(2)

Then, we use getContinuousNodeIds(G) to get a map from the original nodes ids of G, to their new ids.

[34]:
nodeIdMap = nk.graphtools.getContinuousNodeIds(G)

Finally, we get a new graph with compacted node ids.

[35]:
compGraph = nk.graphtools.getCompactedGraph(G, nodeIdMap)
assert(compGraph.numberOfNodes() == G.numberOfNodes())
assert(compGraph.numberOfEdges() == G.numberOfEdges())

maxDegree(G) returns the highest (out) degree of G.

[36]:
nk.graphtools.maxDegree(compGraph)
[36]:
3

maxInDegree(G) returns the highest in-degree of a directed graph G. maxWeightedDegree(G) returns the highest sum of the (out) edge weights of G, while maxWeightedInDegree(G) returns the highest sum of the in-edge weights of a directed graph G.

[ ]:

[ ]: