AI and ML, i.e., artificial intelligence and machine learning, depending on performance. The software stack is just as important as the hardware because good software can tease out the last bit of performance from the physically and technically limited processors.
In recent years, the demand for particular processors for the application areas of artificial intelligence (AI) and machine learning (ML) has increased significantly. This is understandable given the potential of these application areas and the high computational requirements of the applications.
However, the rapid development also makes you think about where the path can lead. So what will the best AI chips of the future look like? The most significant factor in the effectiveness and success of an AI chip is its software stack.
Good software not only simplifies the use of AI processors for developers but can also exploit the full potential of the underlying hardware. Machine intelligence requires more of your software to be efficient – because, in artificial intelligence and machine learning, the calculations are fundamentally different.
What is the difference between calculations in the area of Machine Intelligence?
There are numerous characteristics of AI algorithms that are unique in their respective fields. I want to highlight a few here in particular:
Modern artificial intelligence and machine learning are all about dealing with unsafe information. The variables in a model represent something uncertain – a probability distribution. This has a significant impact on the type of work the chip has to do. To cover the wide range of possibilities that can arise, one needs the detailed accuracy of fractions and a wide dynamic range of options. This requires various floating-point number techniques and algorithms that process them in a probabilistic way from the software point of view.
The data we deal with is probabilistic and comes from a very high-dimensional space with content such as images, sentences, videos, or even just abstract knowledge concepts. These are not the straight data vectors as seen in graphics processing. The larger the number of dimensions in the data, the more irregular and sparse the data access becomes. It follows that many of the techniques commonly used in hardware and software, such as buffering, temporary storage/cache, and vectorization, are not applicable here.
Also, computing with machine intelligence is carried out with big data (large amounts of data for training) and big computing (a large number of computing operations per processed data element), which illustrates the extent of the processing effort.
What does this mean for the Software?
Because computing with machine intelligence is so different, the software in AI and ML has to work harder than in many other areas. The AI software stack needs to combine developer productivity, ease of use, and flexibility with the required efficiency on a large scale.
To solve the problem, the AI software needs to communicate with the hardware at a lower level. This prevents late decisions during hardware runtime and improves efficiency. The probabilistic and higher-quality data structures in AI algorithms make it difficult to predict what will happen during the runtime. Therefore, the software must provide more information about the algorithm’s design and the structure of the machine learning model being executed.
With machine intelligence, the software needs to be programmed to control the number representation and precise memory movement specific to certain AI algorithms to optimize efficiency. The hardware must also be receptive to these optimizations.
The benefits of Software/hardware Co-design
In the future, more hardware/software co-design will be required, in which software algorithms and AI hardware are developed at the same time. This enables a greater degree of collaboration between hardware and software and helps developers organize memory spaces or thread scheduling effectively.
At Graphcore, we have been developing our Poplar software stack and the IPU processor since the company’s beginning. To maximize processor efficiency, we have equipped Poplar with a more advanced software control than in other systems.
An example of this is how we manage storage. Our M2000 IPU machine has an off-chip DDR memory. However, there is no cache and no hardware area to automatically control the movement or buffering of data between the external streaming memory and the on-chip processor memory at runtime.
This is all controlled in software based on the arithmetic graph. Memory management is just one of the software stack parts where we optimize the hardware based on advanced analytics. This is the key to our approach.
When software and hardware work together seamlessly from the start, it is much easier to improve performance and efficiency. With increased software control, we can provide information about how hardware can process different machine, intelligence models. Perhaps we can even learn to develop new AI models that are inherently more powerful and use advanced techniques like sparsity
In the future, the best AI chips will be the ones with the best software: we believe Graphcore will provide these chips. We’re both a software and a hardware company, and it is clear to us that this is the way forward.