One piece of optical hardware performs massively parallel AI calculations

 One piece of optical hardware performs massively parallel AI calculations

This post was originally published on this site

Image of a series of parallel lines in different colors.

Enlarge / The output of two optical frequency combs, showing the light appearing at evenly spaced wavelengths.

AI and machine-learning techniques have become a major focus of everything from cloud computing services to cell phone manufacturers. Unfortunately, our existing processors are a bad match for the sort of algorithms that many of these techniques are based on, in part because they require frequent round trips between the processor and memory. To deal with this bottleneck, researchers have figured out how to perform calculations in memory and designed chips where each processing unit has a bit of memory attached.

Now, two different teams of researchers have figured out ways of performing calculations with light in a way that both merges memory and calculations and allows for massive parallelism. Despite the differences in implementation, the hardware designed by these teams has a common feature: it allows the same piece of hardware to simultaneously perform different calculations using different frequencies of light. While they’re not yet at the level of performance of some dedicated processors, the approach can scale easily and can be implemented using on-chip hardware, raising the process of using it as a dedicated co-processor.

A fine-toothed comb

The new work relies on hardware called a frequency comb, a technology that won some of its creators the 2005 Nobel Prize in Physics. While a lot of interesting physics is behind how the combs work (which you can read more about here if you’re curious), what we care about is the outcome of that physics. While there are several ways to produce a frequency comb, they all produce the same thing: a beam of light that is composed of evenly spaced frequencies. So, a frequency comb in visible wavelengths might be composed of light with a wavelength of 500 nanometers, 510nm, 520nm, and so on.

The metaphor is a comb—a large array of evenly spaced teeth—but the reality underlying the metaphor is that each of the teeth is a distinct wavelength of light. Or, since frequency and wavelength are related, you can view them as evenly spaced frequencies.

We’ve known for a while that it’s possible to perform calculations using light. What the frequency comb does is allow a convenient means for those calculations to be massively parallel. If the frequency comb described above were sent through hardware that manipulated visible wavelengths, every single one of its component frequencies would perform the same calculation, essentially making its operation massively parallel. On its own, that’s not especially useful, unless you really want the results of a calculation confirmed.

But some types of frequency combs allow each wavelength to be adjusted independently, increasing or lowering the intensity of each of the teeth. This allows different calculations to be performed with each tooth in the comb, while retaining the massively parallel aspects.

One of the types of calculations that maps relatively easy onto light is a matrix multiplication that’s heavily used by some AI applications. Specifically, it can be used to perform convolution, a mathematical operation that’s part of the deep neural networks that have excelled at image recognition. Both of these papers do end up using their optical hardware to do image work, but they use very different approaches to get there. We’ll start with the easier one to understand first.

Into the matrix

How do you actually perform image operations using light? The first step is to digitize the image. From there, details about each pixel can be encoded in the intensity of light at specific wavelengths of the frequency comb. These wavelengths are then sent into a square grid of phase-change materials. The phase-change material performs operations on the light because it absorbs different amounts of light based on how ordered or disordered the material is. Depending on the path the light takes through the grid, different amounts will be absorbed, with the final intensity being the readout of the operation, which is essentially a vector multiply-accumulate.

Critically, the grid can perform that operation on different wavelengths at the same time. So, different teeth of the comb can all be fed into the hardware at once. And, since the phase-change material is reconfigurable, the whole thing can be reconfigured to perform different operations at will. As is standard for these sorts of demonstration process, the researchers set the hardware to use on a set of handwritten digits that’s become a standard in the field. It was able to achieve an accuracy of over 95 percent, which is considered successful.

For the demonstration, each of the different pieces of hardware—the laser, the hardware that converts it into a frequency comb, the phase-change memory, and the photon detectors—were all on different chips. But there’s nothing stopping them from being integrated onto a single chip, allowing them to be used as an optical co-processor for AI tasks.

Two significant limits on its operation are the number of teeth in the frequency comb and the size of the grid of phase-change material, and there are ways of increasing both of these. Somewhat more challenging to handle is the limits posed by how quickly the optical hardware can operate. Since there may be ways of handling that, the hardware’s current operation should be considered a lower bound. But even as a lower bound, it’s pretty impressive, capable from operating at 3.8 trillion operations per second.

While specialized AI co-processors already exist, they operate on the same principles as regular processors, meaning heat and power become an issue for them, as well. A key potential advantage here is that the heating and power issues here are almost all focused on the laser light source. As long as those issues can be handled, the device can be operated at full tilt without additional worries regarding these issues.

A matter of time

It should be relatively easy to envision a bunch of photons ping-ponging around a grid of semi-reflective materials. The other paper on the topic relies on a very difficult and less appreciated behavior of light: when light is passing through a material that has a lot of internal refraction, the speed at which the light travels depends on its wavelength.

To take advantage of this, the researchers encode information as vectors into a few of the teeth of the frequency comb. That light is then sent down an optical cable that produces different delays at different wavelengths. The detector at the other side of the cable, however, only accepts input during a specific window of time; if the delay shifts some portions of the input vectors out of that time window, then they aren’t counted toward the output. In essence, this performs a mathematical operation called a convolution, linking the frequency at which the data was encoded to the weight it’s given by the detector based on the timing of its arrival.

And again, convolutions are critical to the deep neural networks used for image classification. Also again, many convolutions can be carried out in parallel, since optical cabling can carry a broad range of wavelengths. As a result, the setup used in the paper was able to hit 11.3 trillion operations per second, although its performance on the digit recognition test was somewhat worse, at 88 percent accuracy.

While the individual operations are fast, the setup absolutely needs the delay to operate properly, meaning that it will never be able to provide the sort of instantaneous response that some applications require. But its operations can be done using off-the-shelf optical telecom equipment. If all the bands used in modern telecom hardware were put to use for operations, over 400 operations could be performed in parallel. And that’s before things like the polarization of the light is used to carry information, which could push performance up to the quadrillion-operations-per-second range.

Nature, 2010. DOI: 10.1038/s41586-020-03070-1, 10.1038/s41586-020-03063-0 (About DOIs).