David Duvenaud was collaborating on a project involving medical data when he encountered a major AI gap.

An AI researcher at the University of Toronto, he wanted to create a deep learning model that would predict a patient’s health over time. But the data in medical records is a bit messy: Throughout your life, you may see the doctor at different times for different reasons, generating a handful of readings at arbitrary intervals. A traditional neural network struggles to handle this. Its design forces it to learn from data with clear observation steps. It is therefore a poor tool for modeling continuous processes, especially those that are measured irregularly over time.

The challenge led Duvenaud and his collaborators at the university and the Vector Institute to rethink neural networks as we know them. Last week their paper was among four other “Best Paper” honors at Neural Information Processing Systems, one of the world’s largest AI research gatherings.

Neural networks are the basic machinery that makes deep learning so powerful. A traditional neural network is made up of stacked layers of simple compute nodes that work together to find patterns in data. Discrete layers are what keep it from effectively modeling continuous processes (we’ll get to that).

In response, the research team’s design removes layers entirely. (Duvenaud is quick to note that they didn’t come up with this idea. They were just the first to implement it in a generalizable way.) To understand how this is possible, let’s first take a look at what layers do. .

The most common process for training a neural network (i.e. supervised learning) is to provide it with a bunch of labeled data. Let’s say you wanted to create a system that recognizes different animals. You would feed a neural network of animal images associated with the corresponding animal names. Under the hood, he begins to solve a crazy math puzzle. It examines all image name pairs and finds a formula that reliably transforms one (the image) into the other (the category). Once that puzzle is solved, he can reuse the formula over and over again to properly categorize any new animal photo, most of the time.

But finding a single formula to describe the whole image-name transformation would be too broad and result in a low-precision model. It would be like trying to use a single rule to tell the difference between cats and dogs. You could say that dogs have floppy ears. But some dogs don’t and others cats do, so you would end up with a lot of false negatives and positives.

This is where the layers of a neural network come in. They divide the transformation process into steps and allow the network to find a series of formulas that each describe a step in the process. So the first layer can take into account all the pixels and use a formula to choose which ones are most relevant to cats versus dogs. A second layer can use another to build larger patterns from groups of pixels and determine if the image has whiskers or ears. Each subsequent layer would identify increasingly complex characteristics of the animal, until the final layer decides “dog” based on the accumulated calculations. This step-by-step decomposition of the process allows a neural network to create more sophisticated models, which in turn should lead to more accurate predictions.

The layered approach has served the AI field well, but it also has a downside. If you want to model anything that continually transforms over time, you need to break it down into discrete steps as well. In practice, if we came back to the health example, it would be like grouping your medical records into finite periods like years or months. You could see how inaccurate that would be. If you went to the doctor on January 11 and again on November 16, the data from both visits would be grouped under the same year.

So the best way to model reality as closely as possible is to add more layers to increase the granularity. (Why not break your recordings down into days or even hours? You could have gone to the doctor twice in a day!) Taken to the extreme, that means the best neural network for this job would have an infinite number of layers to model infinitesimal changes in stages. The question is whether this idea is even practical.

If this is starting to sound familiar to you, it’s because we’ve arrived at exactly the kind of problem that calculus was invented for. Calculus gives you all these nice equations for how to calculate a series of changes through infinitesimal steps – in other words, it saves you the nightmare of modeling continuous change in discrete units. This is the magic of the article by Duvenaud and his collaborators: it replaces the layers with computational equations.

The result is not even really a network anymore; there are no more nodes and connections, just a continuous slab of computation. Nevertheless, true to convention, the researchers named this design an “ODE network” – ODE for “ordinary differential equations”. (They still have to work on their branding.)

If your brain hurts (trust me, so does mine), here’s a nice analogy that Duvenaud uses to tie it all together. Consider a continuous musical instrument like a violin, where you can slide your hand along the string to play any frequency; Now think of a low-key keyboard like a piano, where you have a distinct number of keys to play a limited number of frequencies. A traditional neural network is like a piano: try as you can, you won’t be able to play a slide. You will only be able to approximate the slide by playing a scale. Even if you re-tuned your piano so that the note frequencies were very close to each other, you would still get closer to the slide with a scale. Switching to an ODE network is like switching from your piano to a violin. It may not always be the right tool, but it is more suitable for certain tasks.

In addition to being able to model continuous change, an ODE network also modifies certain aspects of training. With a traditional neural network, you need to specify how many layers you want in your network at the start of training and then wait until training is complete to find out the accuracy of the model. The new method lets you specify the precision you want first, and it will find the most efficient way to practice within that margin of error. On the other hand, you know right off the bat how long it will take a traditional neural network to train. Not so much when using an ODE network. These are the trade-offs that researchers will have to make, explains Duvenaud, when deciding which technique to use in the future.

Currently the paper offers a proof of concept for the design, “but it’s not ready for prime time yet,” says Duvenaud. Like any initial technique proposed in the field, it still needs to be developed, tested and improved until it goes into production. But the method has the potential to turn the field upside down, in the same way Ian Goodfellow did when he published his article on GANs.

“Many of the key advances in machine learning have occurred in the field of neural networks,” said Richard Zemel, research director at the Vector Institute, who was not involved in the article. “The paper is likely to stimulate a whole range of follow-up work, particularly in time series models, which are fundamental in AI applications such as healthcare.”

Remember, when ODE networks explode, you first hear about it here.

*Fixes: An earlier version of the article incorrectly captioned the image at the top of the article as an ordinary differential equation. It shows the trajectories of ordinary neuronal differential equations. The article has also been updated to refer to the new design as an “ODE network” rather than “ODE solver”, to avoid confusion with existing ODE solvers from other areas.*

*__*

*This article originally appeared in our AI The Algorithm newsletter. To receive it directly to your inbox, subscribe here free.*