If you’re a deep learning enthusiast you’re probably already familiar with some of the basic mathematical primitives that have been driving the impressive capabilities of what we call deep neural networks. Although we like to think of a basic artificial neural network as some nodes with some weighted connections, it’s more efficient computationally to think of neural networks as matrix multiplication all the way down. We might draw a cartoon of an artificial neural network like the figure below, with information traveling in from left to right from inputs to outputs (ignoring recurrent networks for now).

This type of neural network is a feed-forward multilayer perceptron (MLP). If we want a computer to compute the forward pass for this model, it’s going to use a string of matrix multiplies and some sort of non-linearity (here represented by the Greek letter sigma) in the hidden layer:

MLPs are well-suited for data that can be naturally shaped as 1D vectors. While neat and all, MLPs use an awful lot of parameters when data samples are large, and this isn’t a very efficient way to treat higher dimensional data like 2D images or 3D volumes. 2D data like images instead naturally lend themselves to the operation of convolution, wherein weights are applied in local neighborhoods across the entire image, instead of granting each point to point connection between layers it’s own weight. This type of weight sharing has a number of advantages, including translation equivariance, regularization, and parameter efficiency.

Convolution can be visualized like so:

Of course we’re not going to sit down with pen and paper and perform these operations by hand, we want an algorithm that can quickly perform convolution across each image channel in a computer-friendly way.

In principle, computers perform convolutions something like the following:

That’s right, convolution operations are again implemented as the multiplication of matrices, although this time it is element-wise. This is thanks to the convolution theorem of the Fourier transform, which states that multiplication in the Fourier domain relates to convolution in the spatial domain. But what happens when our data of interest isn’t particularly well-suited to representation as a 1D vector or a 2D/3D image, and is instead naturally represented as a graph?

For our purposes, a graph is a collection of nodes connected by edges, as shown in the cartoon. The edges can have their own properties such as weights and/or directionality, and the nodes typically have some sort of states or features, just like the node activations in a feed-forward MLP.

In a graph neural network, each “layer” is just a snapshot of the node states of the graph, and these are connected by operational updates related to each node and its neighbors, such as neural networks operating as the edges between nodes.

If we want to use graph neural networks to achieve impressive results on graph-structured data, like what convolutional neural networks did for deep learning on images, we need an efficient way to implement these models on computers. That almost always means we need a way to convert the conceptual graph neural network framework to something that works on a modern deep learning GPU.

A convenient way to represent the connections in a graph is with something called an adjacency matrix. As the name suggests, an adjacency matrix describes which nodes are next to each other (*i.e.* connected to each other by edges) in the graph.

But a graph neural network needs to operate on graphs with arbitrary structure (much like the convolutional kernels of a conv-net can work on images of different height and width), so we can’t expect the input data to have the same adjacency matrix each time or even for each adjacency matrix to have the same dimensions. We can deal with this by combining the adjacency matrices for several samples diagonally into a larger matrix describing all the connections in a batch.

This allows us to deal with multiple graphs with different structures in a single batch, and you’ll notice that this formulation also results in weight sharing between nodes. There are a few more details to this: the adjacency matrix should be normalized so that feature scales don’t completely change, and there are other approaches to convolution on graphs than the graph convolution network approach (GCN) we are talking about here, but it’s a good starting point in understanding the GNN forward pass.

It’s enough to give us an appreciation for the data preparation and mathematical operations needed to implement deep learning on graphs. Luckily, the interest in deep learning for graph-structured data has motivated the development of a number of open source libraries for graph deep learning, leaving more cognitive room for researchers and engineers to concentrate on architectures, experiments, and applications.

In this article we’ll go through 7 up-and-coming open source libraries for graph deep learning, ranked in order of increasing popularity.

1. Why Corporate AI projects fail?

2. How AI Will Power the Next Wave of Healthcare Innovation?

3. Machine Learning by Using Regression Model

4. Top Data Science Platforms in 2021 Other than Kaggle

Reflecting the dominance of the language for graph deep learning, and for deep learning in general, most of the entries on this list use Python and are built on top of TensorFlow, PyTorch, or JAX. This first entry, however, is an open source library for graph neural networks built on the Flux deep learning framework in the Julia programming language.

One may be tempted to write off GeometricFlux.jl, and even the whole idea of using the Julia language for deep learning due to the relatively small number of practitioners, but it is a language with a growing community and offers a number of technical advantages over Python. One would have hardly predicted DeepMind would start ditching TensorFlow in favor of JAX just a few years ago (see entry number 5 on this list), and likewise in just a few short years we may see the Julia language start to supplant Python as the standard language for machine learning.

The Julia programming language was designed from the start to be both highly productive (like Python), and fast like compiled languages including C. Julia language uses just-in-time compilation to achieve fast execution speed, while it’s read-execute-print loop (REPL) makes interactive and iterative programming reasonably productive. You will notice a slight delay when you run code for the first time, especially if you’re used to using Python in a particularly interactive way (like in Jupyter notebooks), but over time the speed-ups for a given workflow can be significant.

Julia is designed as a scientific programming language, and there has been significant development of automatic differentiation packages over the last five years or so. The end result is functionality that can combine research-centered libraries like the DifferentialEquations.jl package with machine learning capabilities as we see in the neural differential equations package DiffEqFlux.jl. The same goes for GeometricFlux.jl, which is built to be compatible with the graph theory research JuliaGraphs ecosystem as well as other parts of Flux.

If you’re using graph deep learning for work, it may be most efficient to stick with a library that’s built on PyTorch or the standard working framework for deep learning used for other projects. If you’re starting from scratch or doing research, however, GeometricFlux.jl makes a compelling entry point for graph deep learning and differentiable programming with Julia. The library’s friendly MIT License also makes it easy to build and contribute the tools you need, or to tackle some of the open issues from the project’s GitHub repository.

The PyTorch Graph Neural Network library is a graph deep learning library from Microsoft, still under active development at version ~0.9.x after being made public in May of 2020. PTGNN is made to be readily familiar for users familiar with building models based on the torch.nn.Module class, and handles the workflow tasks of dataloaders and turning graphs into PyTorch-ready tensors.

PTGNN is based on an interesting architecture called the AbstractNeuralModel. This class encapsulates the entire process of training a graph neural network including tensorizing and pre-proccessing raw data, and includes the TNeuralModule that is the actual neural model sub-classed from PyTorch’s nn.Module class. The neural modules can be used independently of the AbstractNeuralModel object, and in fact can be combined with other types of PyTorch modules/layers if desired.

PTGNN is slightly younger than GeometricFlux.jl and has a less active commit history, but ekes out slightly more GitHub stars and forks. It has the same permissive and open source MIT License, but if you’re looking for a project to contribute to, you’ll need to be fairly self-directed. The “Issues” tab on GitHub provides little to no direction of what needs to be fixed or implemented. PTGNN has a few interesting design elements in its construction that may be of interest to work with or on, but if you’re a graph neural network enthusiast looking for a PyTorch-based graph deep learning library you may be better served by using PyTorch Geometric (number 1 on our list). PyTorch Geometric is more mature, having been in development for about 4 years now, and has an established and growing community of users and developers.

Late last year you may have noticed a blog post from DeepMind with a little less pomp and circumstance than their usual headline-grabbing landmarks. In December 2020 Deepmind described their ongoing efforts in developing and using a capable ecosystem of deep learning research libraries based on the functional differentiable programming library JAX. JAX is the conceptual progeny of what started as an academic project for simple but nigh-universal automatic differentiation in Python (especially NumPy) called Autograd.

After Google scooped up several of the research programmers responsible for the original Autograd, they developed a new library and now we have JAX. JAX is an interesting package due in no small part to its emphasis on composable functional programming paradigms. It also pays attention to the more general concept of “differentiable programming” rather than focusing primarily on neural networks like TensorFlow or PyTorch. Although PyTorch and TensorFlow can both be used to build, say, differentiable physics models instead of neural networks, JAX is more readily amenable to flexible differentiable programming for scientific and other programming tasks from the start. The JAX offering is compelling enough, at least, to induce DeepMind to embark on a substantial adoption and development track, despite having previously spent significant time building TensorFlow-based tools like Sonnet.

As part of DeepMind’s efforts to develop a JAX-based ecosystem for deep learning research they’ve developed a graph learning library called Jraph.

*Original image in the public domain from Wikimedia contributor* *Scambelo*

Credit: BecomingHuman By: James Montantes