# Publications

The following is a list of publications on the basis of NetworKit. We ask you to cite the appropriate ones if you found NetworKit useful for your own research. Also, we would appreciate it if you pointed us to your publications in which you used NetworKit and allowed us to reference them on this page.

## Journal Paper on NetworKit as a Software Toolkit

• C. Staudt, A. Sazonovs and H. Meyerhenke: NetworKit: A Tool Suite for Large-scale Complex Network Analysis. Network Science 4(4), pp. 508-530, December 2016. Cambridge University Press. [arXiv]
Abstract. We introduce NetworKit, an open-source software package for analyzing the structure of large complex networks. Appropriate algorithmic solutions are required to handle increasingly common large graph data sets containing up to billions of connections. We describe the methodology applied to develop scalable solutions to network analysis problems, including techniques like parallelization, heuristics for computationally expensive problems, efficient data structures, and modular software architecture. Our goal for the software is to package results of our algorithm engineering efforts and put them into the hands of domain experts. NetworKit is implemented as a hybrid combining the kernels written in C++ with a Python front end, enabling integration into the Python ecosystem of tested tools for data analysis and scientific computing. The package provides a wide range of functionality (including common and novel analytics algorithms and graph generators) and does so via a convenient interface. In an experimental comparison with related software, NetworKit shows the best performance on a range of typical analysis tasks.

## Publications on Algorithms Available in NetworKit

• E. Angriman, A. van der Grinten, M. Predari and H. Meyerhenke: Approximation of the Diagonal of a Laplacian's Pseudoinverse for Complex Network Analysis In Proc. 2020 - 28th Annual European Symposium on Algorithms. (ESA 2020) [arXiv]
Abstract. The ubiquity of massive graph data sets in numerous applications requires fast algorithms for extracting knowledge from these data. We are motivated here by three electrical measures for the analysis of large small-world graphs $$G=(V,E)$$ -- i.e., graphs with diameter in $$O(log|V|)$$, which are abundant in complex network analysis. From a computational point of view, the three measures have in common that their crucial component is the diagonal of the graph Laplacian's pseudoinverse, L'. Computing $$diag(L')$$ exactly by pseudoinversion, however, is as expensive as dense matrix multiplication -- and the standard tools in practice even require cubic time. Moreover, the pseudoinverse requires quadratic space -- hardly feasible for large graphs. Resorting to approximation by, e.g., using the Johnson-Lindenstrauss transform, requires the solution of $$O(log|V|/eps^2)$$ Laplacian linear systems to guarantee a relative error, which is still very expensive for large inputs. In this paper, we present a novel approximation algorithm that requires the solution of only one Laplacian linear system. The remaining parts are purely combinatorial -- mainly sampling uniform spanning trees, which we relate to $$diag(L')$$ via effective resistances. For small-world networks, our algorithm obtains a ±eps-approximation with high probability, in a time that is nearly-linear in $$|E|$$ and quadratic in $$1/eps$$. Another positive aspect of our algorithm is its parallel nature due to independent sampling. We thus provide two parallel implementations of our algorithm: one using OpenMP, one MPI + OpenMP. In our experiments against the state of the art, our algorithm (i) yields more accurate results, (ii) is much faster and more memory-efficient, and (iii) obtains good parallel speedups, in particular in the distributed setting.

• A. van der Grinten, H. Meyerhenke: Scaling Betweenness Approximation to Billions of Edges by MPI-based Adaptive Sampling In Proc. 2020 - 34th International Parallel and Distributed Processing Symposium. (IPDPS 2020) [arXiv]
Abstract. Betweenness centrality is one of the most popular vertex centrality measures in network analysis. Hence, many (sequential and parallel) algorithms to compute or approximate betweenness have been devised. Recent algorithmic advances have made it possible to approximate betweenness very efficiently on shared-memory architectures. Yet, the best shared-memory algorithms can still take hours of running time for large graphs, especially for graphs with a high diameter or when a small relative error is required. In this work, we present an MPI-based generalization of the state-of-the-art shared-memory algorithm for betweenness approximation. This algorithm is based on adaptive sampling; our parallelization strategy can be applied in the same manner to adaptive sampling algorithms for other problems. In experiments on a 16-node cluster, our MPI-based implementation is by a factor of 16.1x faster than the state-of-the-art shared-memory implementation when considering our parallelization focus -- the adaptive sampling phase -- only. For the complete algorithm, we obtain an average (geom. mean) speedup factor of 7.4x over the state of the art. For some previously very challenging inputs, this speedup is much higher. As a result, our algorithm is the first to approximate betweenness centrality on graphs with several billion edges in less than ten minutes with high accuracy.

• E. Angriman, A. van der Grinten, H. Meyerhenke: Local Search for Group Closeness Maximization on Big Graphs In Proc. 2019 IEEE International Conference on Big Data. (BigData 2019) [arXiv]
Abstract. In network analysis and graph mining, closeness centrality is a popular measure to infer the importance of a vertex. Computing closeness efficiently for individual vertices received considerable attention. The NP-hard problem of group closeness maximization, in turn, is more challenging: the objective is to find a vertex group that is central as a whole and state-of-the-art heuristics for it do not scale to very big graphs yet. In this paper, we present new local search heuristics for group closeness maximization. By using randomized approximation techniques and dynamic data structures, our algorithms are often able to perform locally optimal decisions efficiently. The final result is a group with high (but not optimal) closeness centrality. We compare our algorithms to the current state-of-the-art greedy heuristic both on weighted and on unweighted real-world graphs. For graphs with hundreds of millions of edges, our local search algorithms take only around ten minutes, while greedy requires more than ten hours. Overall, our new algorithms are between one and two orders of magnitude faster, depending on the desired group size and solution quality. For example, on weighted graphs and $$k = 10$$, our algorithms yield solutions of 12.4% higher quality, while also being 793.6x faster. For unweighted graphs and $$k = 10$$, we achieve solutions within 99.4% of the state-of-the-art quality while being 127.8x faster.

• E. Angriman, A. van der Grinten, Aleksandar Bojchevski, Daniel Zügner, Stephan Günnemann, H. Meyerhenke: Group Centrality Maximization for Large-scale Graphs In Proc. 22nd SIAM Symposium on Algorithm Engineering & Experiments (ALENEX 2020) [arXiv]
Abstract. The study of vertex centrality measures is a key aspect of network analysis. Naturally, such centrality measures have been generalized to groups of vertices; for popular measures it was shown that the problem of finding the most central group is NP-hard. As a result, approximation algorithms to maximize group centralities were introduced recently. Despite a nearly-linear running time, approximation algorithms for group betweenness and (to a lesser extent) group closeness are rather slow on large networks due to high constant overheads. That is why we introduce GED-Walk centrality, a new submodular group centrality measure inspired by Katz centrality. In contrast to closeness and betweenness, it considers walks of any length rather than shortest paths, with shorter walks having a higher contribution. We define algorithms that (i) efficiently approximate the GED-Walk score of a given group and (ii) efficiently approximate the (proved to be NP-hard) problem of finding a group with highest GED-Walk score. Experiments on several real-world datasets show that scores obtained by GED-Walk improve performance on common graph mining tasks such as collective classification and graph-level classification. An evaluation of empirical running times demonstrates that maximizing GED-Walk (in approximation) is two orders of magnitude faster compared to group betweenness approximation and for group sizes ≤ 100 one to two orders faster than group closeness approximation. For graphs with tens of millions of edges, GED-Walk maximization typically needs less than one minute. Furthermore, our experiments suggest that the maximization algorithms scale linearly with the size of the input graph and the size of the group.

• A.v.d. Grinten, E. Angriman, H. Meyerhenke: Parallel Adaptive Sampling with almost no Synchronization. In Proceedings of the 25th International Conference on Parallel and Distributed Computing (Euro-Par 2019) [arXiv] [DOI: 10.1007/978-3-030-29400-7_31]
Abstract. Approximation via sampling is a widespread technique whenever exact solutions are too expensive. In this paper, we present techniques for an efficient parallelization of adaptive (a. k. a. progressive) sampling algorithms on multi-threaded shared-memory machines. Our basic algorithmic technique requires no synchronization except for atomic load-acquire and store-release operations. It does, however, require $$O(n)$$ memory per thread, where n is the size of the sampling state. We present variants of the algorithm that either reduce this memory consumption to O(1) or ensure that deterministic results are obtained. Using the KADABRA algorithm for betweenness centrality (a popular measure in network analysis) approximation as a case study, we demonstrate the empirical performance of our techniques. In particular, on a 32-core machine, our best algorithm is 2.9x faster than what we could achieve using a straightforward OpenMP-based parallelization and 65.3x faster than the existing implementation of KADABRA.

• A.v.d. Grinten, E. Bergamini, O. Green, D. A. Bader, H. Meyerhenke: Scalable Katz Ranking Computation in Large Static and Dynamic Graphs. In 26th Annual European Symposium on Algorithms (ESA 2018) [arXiv] [DOI: 10.4230/LIPIcs.ESA.2018.42]
Abstract. Network analysis defines a number of centrality measures to identify the most central nodes in a network. Fast computation of those measures is a major challenge in algorithmic network analysis. Aside from closeness and betweenness, Katz centrality is one of the established centrality measures. In this paper, we consider the problem of computing rankings for Katz centrality. In particular, we propose upper and lower bounds on the Katz score of a given node. While previous approaches relied on numerical approximation or heuristics to compute Katz centrality rankings, we construct an algorithm that iteratively improves those upper and lower bounds until a correct Katz ranking is obtained. We extend our algorithm to dynamic graphs while maintaining its correctness guarantees. Experiments demonstrate that our static graph algorithm outperforms both numerical approaches and heuristics with speedups between 1.5 x and 3.5 x, depending on the desired quality guarantees. Our dynamic graph algorithm improves upon the static algorithm for update batches of less than 10000 edges. We provide efficient parallel CPU and GPU implementations of our algorithms that enable near real-time Katz centrality computation for graphs with hundreds of millions of nodes in fractions of seconds.

• C. J. Carstens, M. Hamann, U. Meyer, M. Penschuck, H. Tran, D. Wagner: Parallel and I/O-efficient Randomisation of Massive Networks using Global Curveball Trades. In 26th Annual European Symposium on Algorithms (ESA 2018) [arXiv] [DOI: 10.4230/LIPIcs.ESA.2018.11]
Abstract. Graph randomisation is a crucial task in the analysis and synthesis of networks. It is typically implemented as an edge switching process (ESMC) repeatedly swapping the nodes of random edge pairs while maintaining the degrees involved [Mihail and Zegura, 2003]. Curveball is a novel approach that instead considers the whole neighbourhoods of randomly drawn node pairs. Its Markov chain converges to a uniform distribution, and experiments suggest that it requires less steps than the established ESMC [Carstens et al., 2016]. Since trades however are more expensive, we study Curveball's practical runtime by introducing the first efficient Curveball algorithms: the I/O-efficient EM-CB for simple undirected graphs and its internal memory pendant IM-CB. Further, we investigate global trades [Carstens et al., 2016] processing every node in a single super step, and show that undirected global trades converge to a uniform distribution and perform superior in practice. We then discuss EM-GCB and EM-PGCB for global trades and give experimental evidence that EM-PGCB achieves the quality of the state-of-the-art ESMC algorithm EM-ES [M. Hamann et al., 2017] nearly one order of magnitude faster.

• F.B. Mocnik: The Polynomial Volume Law of Complex Networks in the Context of Local and Global Optimization. In Scientific Reports, volume 8, Article number: 11274 (2018). [DOI: 10.1038/s41598-018-29131-0]
Abstract. Many complex networks expose global hub structures: for some nodes, the number of incident edges far exceeds the average, leading to a small average shortest path length. Such ‘small-world properties’ are often guided by a scale-free power-law distribution of the node degrees, and self-organization inside the network has been identified as a reason driving the emergence of this structure. Small-world networks have recently raised lots of interest, because they capture the global topology of the World-Wide Web, metabolic, and social networks. While small-world networks reflect global structures, little attention is paid to the local structure of complex networks. In this article neighbourhoods are demonstrated to share a common local structure in many real complex networks, manifested by a polynomial volume law. This law can, in case of networks that are embedded in space, be explained in terms of the embedding and the properties of Euclidean space. A model of hierarchical spatial networks is introduced to examine the effect of global structures, in particular of hierarchies, on the polynomial volume law. It turns out that the law is robust against the coexistence of such global structures. The local structure of space and global optimization can both be found in transport, brain, and communication networks, which suggests the polynomial volume law, often in combination with hierarchies or other global optimization principles, to be a generic property inherent to many networks.

• P. Bisenius, E. Bergamini, E. Angriman, H. Meyerhenke: Computing Top-k Closeness Centrality in Fully-dynamic Graphs. In Proc. 20th SIAM Workshop on Algorithm Engineering & Experiments (ALENEX 2018) [arXiv] [DOI: 10.1137/1.9781611975055.3]
Abstract. Closeness is a widely-studied centrality measure. Since it requires all pairwise distances, computing closeness for all nodes is infeasible for large real-world networks. However, for many applications, it is only necessary to find the k most central nodes and not all closeness values. Prior work has shown that computing the top-k nodes with highest closeness can be done much faster than computing closeness for all nodes in real-world networks. However, for networks that evolve over time, no dynamic top-k closeness algorithm exists that improves on static recomputation. In this paper, we present several techniques that allow us to efficiently compute the k nodes with highest (harmonic) closeness after an edge insertion or an edge deletion. Our algorithms use information obtained during earlier computations to omit unnecessary work. However, they do not require asymptotically more memory than the static algorithms (i. e., linear in the number of nodes). We propose separate algorithms for complex networks (which exhibit the small-world property) and networks with large diameter such as street networks, and we compare them against static recomputation on a variety of real-world networks. On many instances, our dynamic algorithms are two orders of magnitude faster than recomputation; on some large graphs, we even reach average speedups between $$10^3$$ and $$10^4$$.

• E. Bergamini, T. Gonser, H. Meyerhenke: Scaling up Group Closeness Maximization. In Proc. 20th SIAM Workshop on Algorithm Engineering & Experiments (ALENEX 2018) [arXiv] [DOI: 10.1137/1.9781611975055.18]
Abstract. Closeness is a widely-used centrality measure in social network analysis. For a node it indicates the inverse average shortest-path distance to the other nodes of the network. While the identification of the k nodes with highest closeness received significant attention, many applications are actually interested in finding a group of nodes that is central as a whole. For this problem, only recently a greedy algorithm with approximation ratio $$(1-1/e)$$ has been proposed [Chen et al., ADC 2016]. Since this algorithm's running time is still expensive for large networks, a heuristic without approximation guarantee has also been proposed in the same paper. In the present paper we develop new techniques to speed up the greedy algorithm without losing its theoretical guarantee. Compared to a straightforward implementation, our approach is orders of magnitude faster and, compared to the heuristic proposed by Chen et al., we always find a solution with better quality in a comparable running time in our experiments. Our method Greedy++ allows us to approximate the group with maximum closeness on networks with up to hundreds of millions of edges in minutes or at most a few hours. To have the same theoretical guarantee, the greedy approach by [Chen et al., ADC 2016] would take several days already on networks with hundreds of thousands of edges. In a comparison with the optimum, our experiments show that the solution found by Greedy++ is actually much better than the theoretical guarantee. Over all tested networks, the empirical approximation ratio is never lower than 0.97. Finally, we study for the first time the correlation between the top-k nodes with highest individual closeness and an approximation of the most central group in large networks. Our results show that the overlap between the two is relatively small, which indicates empirically the need to distinguish between the two problems.

• C. L. Staudt, M. Hamann, A. Gutfraind, I. Safro, H. Meyerhenke: Generating realistic scaled complex networks. In Journal of Applied Network Science, Volume 2, 2017. [arXiv] [DOI: 10.1007/s41109-017-0054-z]
Abstract. Research on generative models plays a central role in the emerging field of network science, studying how statistical patterns found in real networks could be generated by formal rules. Output from these generative models is then the basis for designing and evaluating computational methods on networks including verification and simulation studies. During the last two decades, a variety of models has been proposed with an ultimate goal of achieving comprehensive realism for the generated networks. In this study, we (a) introduce a new generator, termed ReCoN; (b) explore how ReCoN and some existing models can be fitted to an original network to produce a structurally similar replica, (c) use ReCoN to produce networks much larger than the original exemplar, and finally (d) discuss open problems and promising research directions. In a comparative experimental study, we find that ReCoN is often superior to many other state-of-the-art network generation methods. We argue that ReCoN is a scalable and effective tool for modeling a given network while preserving important properties at both micro- and macroscopic scales, and for scaling the exemplar data by orders of magnitude in size.

• E. Bergamini, M. Wegner, D. Lukarski, H. Meyerhenke: Estimating Current-Flow Closeness Centrality with a Multigrid Laplacian Solver. In Proc. CSC '16. SIAM, 2016. [arXiv] [DOI: 10.1137/1.9781611974690.ch1]
Abstract. Matrices associated with graphs, such as the Laplacian, lead to numerous interesting graph problems expressed as linear systems. One field where Laplacian linear systems play a role is network analysis, e. g. for certain centrality measures that indicate if a node (or an edge) is important in the network. One such centrality measure is current-flow closeness. To allow network analysis workflows to profit from a fast Laplacian solver, we provide an implementation of the LAMG multigrid solver in the NetworKit package, facilitating the computation of current-flow closeness values or related quantities. Our main contribution consists of two algorithms that accelerate the current-flow computation for one node or a reasonably small node subset significantly. One algorithm is an unbiased estimator using sampling, the other one is based on the Johnson- Lindenstrauss transform. Our inexact algorithms lead to very accurate results in practice. Thanks to them one is now able to compute an estimation of current-flow closeness of one node on networks with tens of millions of nodes and edges within seconds or a few minutes. From a network analytical point of view, our experiments indicate that current-flow closeness can discriminate among different nodes significantly better than traditional shortest-path closeness and is also considerably more resistant to noise – we thus show that two known drawbacks of shortest-path closeness are alleviated by the current-flow variant.

• M. von Looz, M. Özdayi, S. Laue, H. Meyerhenke: Generating massive complex networks with hyperbolic geometry faster in practice. In Proc. HPEC '16. IEEE, 2016. [arXiv]
Abstract. Generative network models play an important role in algorithm development, scaling studies, network analysis, and realistic system benchmarks for graph data sets. The commonly used graph-based benchmark model R-MAT has some drawbacks concerning realism and the scaling behavior of network properties. A complex network model gaining considerable popularity builds random hyperbolic graphs, generated by distributing points within a disk in the hyperbolic plane and then adding edges between points whose hyperbolic distance is below a threshold. We present in this paper a fast generation algorithm for such graphs. Our experiments show that our new generator achieves speedup factors of x3-60 over the best previous implementation. One billion edges can now be generated in under one minute on a shared-memory workstation. Furthermore, we present a dynamic extension to model gradual network change, while preserving at each step the point position probabilities.

• M. von Looz, H. Meyerhenke: Querying Probabilistic Neighborhoods in Spatial Data Sets Efficiently. In Proc. IWOCA 2016, Springer, 2016. [arXiv]
Abstract. The probability that two spatial objects establish some kind of mutual connection often depends on their proximity. To formalize this concept, we define the notion of a probabilistic neighborhood: $$\newcommand{\dist}{\operatorname{dist}}$$ Let $$P$$ be a set of $$n$$ points in $$\mathbb{R}^d$$, $$q \in \mathbb{R}^d$$ a query point, $$\dist$$ a distance metric, and $$f : \mathbb{R}^+ \rightarrow [0,1]$$ a monotonically decreasing function. Then, the probabilistic neighborhood $$N(q, f)$$ of $$q$$ with respect to $$f$$ is a random subset of $$P$$ and each point $$p \in P$$ belongs to $$N(q,f)$$ with probability $$f(\dist(p,q))$$. Possible applications include query sampling and the simulation of probabilistic spreading phenomena, as well as other scenarios where the probability of a connection between two entities decreases with their distance. We present a fast, sublinear-time query algorithm to sample probabilistic neighborhoods from planar point sets. For certain distributions of planar $$P$$, we prove that our algorithm answers a query in $$O((|N(q,f)| + \sqrt{n})\log n)$$ time with high probability. In experiments this yields a speedup over pairwise distance probing of at least one order of magnitude, even for rather small data sets with $$n=10^5$$ and also for other point distributions not covered by the theoretical results.

• E. Bergamini, H. Meyerhenke: Approximating Betweenness Centrality in Fully-dynamic Networks. In Internet Mathematics, Volume 12, Issue 5, 2016. [arXiv] [DOI: 10.1080/15427951.2016.1177802]
Abstract. Betweenness is a well-known centrality measure that ranks the nodes of a network according to their participation in shortest paths. Since an exact computation is prohibitive in large networks, several approximation algorithms have been proposed. Besides that, recent years have seen the publication of dynamic algorithms for efficient recomputation of betweenness in networks that change over time. In this paper we propose the first betweenness centrality approximation algorithms with a provable guarantee on the maximum approximation error for dynamic networks. Several new intermediate algorithmic results contribute to the respective approximation algorithms: (i) new upper bounds on the vertex diameter, (ii) the first fully-dynamic algorithm for updating an approximation of the vertex diameter in undirected graphs, and (iii) an algorithm with lower time complexity for updating single-source shortest paths in unweighted graphs after a batch of edge actions. Using approximation, our algorithms are the first to make in-memory computation of betweenness in dynamic networks with millions of edges feasible. Our experiments show that our algorithms can achieve substantial speedups compared to recomputation, up to several orders of magnitude. Moreover, the approximation accuracy is usually significantly better than the theoretical guarantee in terms of absolute error. More importantly, for reasonably small approximation error thresholds, the rank of nodes is well preserved, in particular for nodes with high betweenness.

• G. Lindner, C. L. Staudt, M. Hamann, H. Meyerhenke, D. Wagner: Structure-Preserving Sparsification Methods for Social Networks. To appear in Social Network Analysis and Mining.
Abstract. Sparsification reduces the size of networks while preserving structural and statistical properties of interest. Various sparsifying algorithms have been proposed in different contexts. We contribute the first systematic conceptual and experimental comparison of edge sparsification methods on a diverse set of network properties. It is shown that they can be understood as methods for rating edges by importance and then filtering globally or locally by these scores. We show that applying a local filtering technique improves the preservation of all kinds of properties. In addition, we propose a new sparsifi- cation method (Local Degree) which preserves edges leading to local hub nodes. All methods are evaluated on a set of social networks from Facebook, Google+, Twitter and LiveJournal with respect to network properties including diameter, connected components, community structure, multiple node centrality measures and the behavior of epidemic simulations. In order to assess the preservation of the community structure, we also include experiments on synthetically generated networks with ground truth communities. Experiments with our implementations of the sparsification methods (included in the open-source network analysis tool suite NetworKit) show that many network properties can be preserved down to about 20% of the original set of edges for sparse graphs with a reasonable density. The experimental results allow us to differentiate the behavior of different methods and show which method is suitable with respect to which property. While our Local Degree method is best for preserving connectivity and short distances, other newly introduced local variants are best for preserving the community structure.

• E. Bergamini, M. Borassi, P. Crescenzi, A. Marino, H. Meyerhenke: Computing Top-k Closeness Centrality Faster in Unweighted Graphs. In Proc. 18th SIAM Workshop on Algorithm Engineering & Experiments (ALENEX 2016). [arXiv] [DOI: 10.1137/1.9781611974317.6]
Abstract. Centrality indices are widely used analytic measures for the importance of nodes in a network. Closeness centrality is very popular among these measures. For a single node v, it takes the sum of the distances of v to all other nodes into account. The currently best algorithms in practical applications for computing the closeness for all nodes exactly in unweighted graphs are based on breadth-first search (BFS) from every node. Thus, even for sparse graphs, these algorithms require quadratic running time in the worst case, which is prohibitive for large networks.
In many relevant applications, however, it is un- necessary to compute closeness values for all nodes. Instead, one requires only the k nodes with the highest closeness values in descending order. Thus, we present a new algorithm for computing this top-k ranking in unweighted graphs. Following the rationale of previous work, our algorithm significantly re- duces the number of traversed edges. It does so by computing upper bounds on the closeness and stopping the current BFS search when k nodes already have higher closeness than the bounds computed for the other nodes.
In our experiments with real-world and synthetic instances of various types, one of these new bounds is good for small-world graphs with low diameter (such as social networks), while the other one excels for graphs with high diameter (such as road networks). Combining them yields an algorithm that is faster than the state of the art for top-k computations for all test instances, by a wide margin for high-diameter.

• M. von Looz, R. Prutkin and H. Meyerhenke: Fast Generation of Complex Networks with Underlying Hyperbolic Geometry. In Proc. 26th International Symposium on Algorithms and Computation (ISAAC 2015). Code in NetworKit. [arXiv]
Abstract. Complex networks have become increasingly popular for mod- eling various real-world phenomena. Realistic generative network models are important in this context as they avoid privacy concerns of real data and simplify complex network research regarding data sharing, reproducibility, and scalability studies. Random hyperbolic graphs are a well-analyzed family of geometric graphs. Previous work provided empir- ical and theoretical evidence that this generative graph model creates networks with non-vanishing clustering and other realistic features. How- ever, the investigated networks in previous applied work were small, possibly due to the quadratic running time of a previous generator. In this work we provide the first generation algorithm for these networks with subquadratic running time. We prove a time complexity of $$\mathcal{O}((n^{3/2} + m) \log n)$$ with high probability for the generation process. This running time is confirmed by experimental data with our implementation. The acceleration stems primarily from the reduction of pairwise distance computations through a polar quadtree, which we adapt to hyperbolic space for this purpose. In practice we improve the running time of a previous implementation by at least two orders of magnitude this way. Networks with billions of edges can now be generated in a few minutes. Finally, we evaluate the largest networks of this model published so far. Our empirical analysis shows that important features are retained over different graph densities and degree distributions.

• E. Bergamini and H. Meyerhenke: Fully-dynamic Approximation of Betweenness Centrality. In Proc. 23rd European Symposium on Algorithms (ESA 2015). [arXiv] [DOI: 10.1007/978-3-662-48350-3_14]
Abstract. Betweenness is a well-known centrality measure that ranks the nodes of a network according to their participation in shortest paths. Since an exact computation is prohibitive in large networks, several approximation algorithms have been proposed. Besides that, recent years have seen the publication of dynamic algorithms for efficient recomputation of betweenness in evolving networks. In previous work we proposed the first semi-dynamic algorithms that recompute an approximation of betweenness in connected graphs after batches of edge insertions.In this paper we propose the first fully-dynamic approximation algorithms (for weighted and unweighted undirected graphs that need not to be connected) with a provable guarantee on the maximum approximation error. The transfer to fully-dynamic and disconnected graphs implies additional algorithmic problems that could be of independent interest. In particular, we propose a new upper bound on the vertex diameter for weighted undirected graphs. For both weighted and unweighted graphs, we also propose the first fully-dynamic algorithms that keep track of this upper bound. In addition, we extend our former algorithm for semi- dynamic BFS to batches of both edge insertions and deletions.
Using approximation, our algorithms are the first to make in-memory computation of betweenness in fully-dynamic networks with millions of edges feasible. Our experiments show that they can achieve substantial speedups compared to recomputation, up to several orders of magnitude.

• E. Bergamini, H. Meyerhenke and C. Staudt: Approximating Betweenness Centrality in Large Evolving Networks. In Proc. 17th SIAM Workshop on Algorithm Engineering & Experiments (ALENEX 2015). [arXiv] [DOI: 10.1137/1.9781611973754.12]
Abstract. Betweenness centrality ranks the importance of nodes by their participation in all shortest paths of the network. Therefore computing exact betweenness values is impractical in large networks. For static networks, approximation based on randomly sampled paths has been shown to be significantly faster in practice. However, for dynamic networks, no approximation algorithm for betweenness centrality is known that improves on static recomputation. We address this deficit by proposing two incremental approximation algorithms (for weighted and unweighted connected graphs) which provide a provable guarantee on the absolute approximation error. Processing batches of edge insertions, our algorithms yield significant speedups up to a factor of 104 compared to restarting the approximation. This is enabled by investing memory to store and efficiently update shortest paths. As a building block, we also propose an asymptotically faster algorithm for updating the SSSP problem in unweighted graphs. Our experimental study shows that our algorithms are the first to make in-memory computation of a betweenness ranking practical for million-edge semi-dynamic networks. Moreover, our results show that the accuracy is even better than the theoretical guarantees in terms of absolutes errors and the rank of nodes is well preserved, in particular for those with high betweenness.

• C. Staudt, Y. Marrakchi, H. Meyerhenke: Detecting Communities Around Seed Nodes in Complex Networks. In Proc. First International Workshop on High Performance Big Graph Data Management, Analysis, and Mining, co-located with the IEEE BigData 2014 conference.
Abstract. The detection of communities (internally dense subgraphs) is a network analysis task with manifold applications. The special task of selective community detection is concerned with finding high-quality communities locally around seed nodes. Given the lack of conclusive experimental studies, we perform a systematic comparison of different previously published as well as novel methods. In particular we evaluate their performance on large complex networks, such as social networks. Algorithms are compared with respect to accuracy in detecting ground truth communities, community quality measures, size of communities and running time. We implement a generic greedy algorithm which subsumes several previous efforts in the field. Experimental evaluation of multiple objective functions and optimizations shows that the frequently proposed greedy approach is not adequate for large datasets. As a more scalable alternative, we propose selSCAN, our adaptation of a global, density-based community detection algorithm. In a novel combination with algebraic distances on graphs, query times can be strongly reduced through preprocessing. However, selSCAN is very sensitive to the choice of numeric parameters, limiting its practicality. The random-walk-based PageRankNibble emerges from the comparison as the most successful candidate.

• C. Staudt and H. Meyerhenke: Engineering Parallel Algorithms for Community Detection in Massive Networks. Accepted by IEEE Transactions on Parallel and Distributed Systems (TPDS). [arXiv] [DOI: 10.1109/TPDS.2015.2390633] © 2015 IEEE
Please note that NetworKit 3.5 is the last version to include an implementation of the EPP algorithm.

Abstract The amount of graph-structured data has recently experienced an enormous growth in many applications. To transform such data into useful information, fast analytics algorithms and software tools are necessary. One common graph analytics kernel is disjoint community detection (or graph clustering). Despite extensive research on heuristic solvers for this task, only few parallel codes exist, although parallelism will be necessary to scale to the data volume of real-world applications. We address the deficit in computing capability by a flexible and extensible community detection framework with shared-memory parallelism. Within this framework we design and implement efficient parallel community detection heuristics: A parallel label propagation scheme; the first large-scale parallelization of the well-known Louvain method, as well as an extension of the method adding refinement; and an ensemble scheme combining the above. In extensive experiments driven by the algorithm engineering paradigm, we identify the most successful parameters and combinations of these algorithms. We also compare our implementations with state-of-the-art competitors. The processing rate of our fastest algorithm often reaches 50M edges/second. We recommend the parallel Louvain method and our variant with refinement as both qualitatively strong and fast. Our methods are suitable for massive data sets with billions of edges.

• C. Staudt and H. Meyerhenke: Engineering High-Performance Community Detection Heuristics for Massive Graphs. In: Proceedings of the 2013 International Conference on Parallel Processing. [updated and extended version on arXiv, bibtex]
Abstract The amount of graph-structured data has recently experienced an enormous growth in many applications. To transform such data into useful information, high-performance analytics algorithms and software tools are necessary. One common graph analytics kernel is community detection (or graph clustering). Despite extensive research on heuristic solvers for this task, only few parallel codes exist, although parallelism will be necessary to scale to the data volume of real-world applications. We address the deficit in computing capability by a flexible and extensible community detection framework with shared-memory parallelism. Within this framework we design and implement efficient parallel community detection heuristics: A parallel label propagation scheme; the first large-scale parallelization of the well-known Louvain method, as well as an extension of the method adding refinement; and an ensemble scheme combining the strengths of the above. In extensive experiments driven by the algorithm engineering paradigm, we identify the most successful parameters and combinations of these algorithms. We also compare our implementations with state of the art competitors. The processing rate of our fastest algorithm often exceeds 10M edges/second, making it suitable for massive data streams. We recommend the parallel Louvain method and our variant with refinement as both qualitatively strong and relatively fast. Moreover, our fast ensemble algorithm yields a good tradeoff between quality and speed for community detection in very large networks.

• C. Staudt, M. Hamann, I. Safro, A. Gutfraind and H. Meyerhenke: Generating Scaled Replicas of Real-World Complex Networks. In: Proceedings of the 5th International Workshop on Complex Networks and their Applications [arXiv]
Abstract Research on generative models plays a central role in the emerging field of network science, studying how statistical patterns found in real networks can be generated by formal rules. During the last two decades, a variety of models has been proposed with an ultimate goal of achieving comprehensive realism for the generated networks. In this study, we (a) introduce a new generator, termed ReCoN; (b) explore how models can be fitted to an original network to produce a structurally similar replica, and (c) aim for producing much larger networks than the original exemplar. In a comparative experimental study, we find ReCoN often superior to many other state-of-the-art network generation methods. Our design yields a scalable and effective tool for replicating a given network while preserving important properties at both micro- and macroscopic scales and (optionally) scaling the replica by orders of magnitude in size. We recommend ReCoN as a general practical method for creating realistic test data for the engineering of computational methods on networks, verification, and simulation studies. We provide scalable open-source implementations of most studied methods, including ReCoN.

## Publications Using NetworKit (notable examples)

• J. Kreutel, S. Martus, E. Thomalla and D. Zimmer: Die Germanistik der Germanistik. Published in Internationales Archiv für Sozialgeschichte der deutschen Literatur issue 44, vol. 2, 2019. [De Gruyter]
Abstract The article examines the history and structure of the international bibliographical journal (internationales Referatenorgan) Germanistik from 1960 to 2009. By combining qualitative archival research and quantitative data analysis, it seeks to explore how the journal simultaneously took into account political, economic, and academic expectations; how it stimulated scholarly working practices; and how it contributed to the formation of networks within the heterogeneous field of German Studies.

• R. Glantz, H. Meyerhenke: Many-to-many Correspondences between Partitions: Introducing a Cut-based Approach. To appear in Proc. 18th SIAM Intl. Conf. on Data Mining (SDM 2018). [arXiv]
Abstract Let $$P$$ and $$P'$$ be finite partitions of the set $$V$$. Finding good correspondences between the parts of $$P$$ and those of $$P'$$ is helpful in classification, pattern recognition, and network analysis. Unlike common similarity measures for partitions that yield only a single value, we provide specifics on how $$P$$ and $$P′$$ correspond to each other.
To this end, we first define natural collections of best correspondences under three constraints $$C_{one}$$, $$C_{two}$$, and $$C_{three}$$. In case of $$C_{one}$$, the best correspondences form a minimum cut basis of a certain bipartite graph, whereas the other two lead to minimum cut bases of $$P$$ w.r.t. $$P′$$. We also introduce a constraint, $$C_{four}$$, which tightens $$C_{three}$$; both are useful for finding consensus partitions. We then develop branch-and-bound algorithms for finding minimum $$P_s-P_t$$ cuts of $$P$$ and thus $$\|P\|−1$$ best correspondences under $$C_{two}$$, $$C_{three}$$, and $$C_{four}$$, respectively.
In a case study, we use the correspondences to gain insight into a community detection algorithm. The results suggest, among others, that only very minor losses in the quality of the correspondences occur if the branch-and-bound algorithm is restricted to its greedy core. Thus, even for graphs with more than half a million nodes and hundreds of communities, we can find hundreds of best or almost best correspondences in less than a minute.

• M. Wegner, O. Taubert, A. Schug, H. Meyerhenke: Maxent-stress Optimization of 3D Biomolecular Models. In Proc. 25th European Symposium on Algorithms (ESA 2017). [arXiv]
Abstract Knowing a biomolecule's structure is inherently linked to and a prerequisite for any detailed understanding of its function. Significant effort has gone into developing technologies for structural characterization. These technologies do not directly provide 3D structures; instead they typically yield noisy and erroneous distance information between specific entities such as atoms or residues, which have to be translated into consistent 3D models.
Here we present an approach for this translation process based on maxent-stress optimization. Our new approach extends the original graph drawing method for the new application's specifics by introducing additional constraints and confidence values as well as algorithmic components. Extensive experiments demonstrate that our approach infers structural models (i. e., sensible 3D coordinates for the molecule's atoms) that correspond well to the distance information, can handle noisy and error-prone data, and is considerably faster than established tools. Our results promise to allow domain scientists nearly-interactive structural modeling based on distance constraints.

• M. Lozano, C. García-Martínez, F. J. Rodríguez, H. M. Trujillo: Optimizing network attacks by artificial bee colony. In Information Sciences, Volume 377, pp. 30-50, January 2017.
Abstract Over the past few years, the task of conceiving effective attacks to complex networks has arisen as an optimization problem. Attacks are modelled as the process of removing a number k of vertices, from the graph that represents the network, and the goal is to maximise or minimise the value of a predefined metric over the graph. In this work, we present an optimization problem that concerns the selection of nodes to be removed to minimise the maximum betweenness centrality value of the residual graph. This metric evaluates the participation of the nodes in the communications through the shortest paths of the network.
To address the problem we propose an artificial bee colony algorithm, which is a swarm intelligence approach inspired in the foraging behaviour of honeybees. In this framework, bees produce new candidate solutions for the problem by exploring the vicinity of previous ones, called food sources. The proposed method exploits useful problem knowledge in this neighbourhood exploration by considering the partial destruction and heuristic reconstruction of selected solutions. The performance of the method, with respect to other models from the literature that can be adapted to face this problem, such as sequential centrality-based attacks, module-based attacks, a genetic algorithm, a simulated annealing approach, and a variable neighbourhood search, is empirically shown.

• M. Riondato, E. Upfal: ABRA: Approximating Betweenness Centrality in Static and Dynamic Graphs with Rademacher Averages. In Proc. 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2016), August 2016. [arXiv]
Abstract We present ABRA, a suite of algorithms that compute and maintain probabilistically-guaranteed, high-quality, approximations of the betweenness centrality of all nodes (or edges) on both static and fully dynamic graphs. Our algorithms rely on random sampling and their analysis leverages on Rademacher averages and pseudodimension, fundamental concepts from statistical learning theory. To our knowledge, this is the first application of these concepts to the field of graph analysis. The results of our experimental evaluation show that our approach is much faster than exact methods, and vastly outperforms, in both speed and number of samples, current state-of-the-art algorithms with the same quality guarantees.

• M. von Looz, M. Wolter, C. Jacob, H. Meyerhenke: Better partitions of protein graphs for subsystem quantum chemistry. In Proc. 15th Intl. Symp. on Experimental Algorithms (SEA 2016), June 2016.
Abstract Determining the interaction strength between proteins and small molecules is key to analyzing their biological function. Quantum-mechanical calculations such as Density Functional Theory (DFT) give accurate and theoretically well-founded results. With common implementations the running time of DFT calculations increases quadratically with molecule size. Thus, numerous subsystem-based approaches have been developed to accelerate quantum-chemical calculations. These approaches partition the protein into different fragments, which are treated separately. Interactions between different fragments are approximated and introduce inaccuracies in the calculated interaction energies.
To minimize these inaccuracies, we represent the amino acids and their interactions as a weighted graph in order to apply graph partitioning. None of the existing graph partitioning work can be directly used, though, due to the unique constraints in partitioning such protein graphs. We therefore present and evaluate several algorithms, partially building upon established concepts, but adapted to handle the new constraints. For the special case of partitioning a protein along the main chain, we also present an efficient dynamic programming algorithm that yields provably optimal results. In the general scenario our algorithms usually improve the previous approach significantly and take at most a few seconds.

• P. Crescenzi, G. D’Angelo, L. Severini, Y. Velaj: Greedily Improving Our Own Centrality in A Network. In Proc. 14th Intl. Symp. on Experimental Algorithms (SEA 2015). LNCS 9125, pp. 43-55. Springer International Publishing, 2015.
Abstract The closeness and the betweenness centralities are two well-known measures of importance of a vertex within a given complex network. Having high closeness or betweenness centrality can have positive impact on the vertex itself: hence, in this paper we consider the problem of determining how much a vertex can increase its centrality by creating a limited amount of new edges incident to it. We first prove that this problem does not admit a polynomial-time approximation scheme (unless $$P=NP$$), and we then propose a simple greedy approximation algorithm (with an almost tight approximation ratio), whose performance is then tested on synthetic graphs and real-world networks.

• D. Hoske, D. Lukarski, H. Meyerhenke, M. Wegner: Is Nearly-linear the same in Theory and Practice? A Case Study with a Combinatorial Laplacian Solver. In Proc. 14th Intl. Symp. on Experimental Algorithms (SEA 2015). LNCS 9125, pp. 205-218. Springer International Publishing, 2015. [arXiv]
For the paper follow the arXiv link above. If you are interested in the implementation, see ParCo's software page.

## Projects Using NetworKit

Further projects using NetworKit can be found on our projects page and on Google Scholar (here and there).

## Student Theses Using NetworKit

A list of student theses based on NetworKit can be found here.