Edinburgh University researchers develop software to make AI 10 times faster
Researchers at the University of Edinburgh have developed software that could allow artificial intelligence (AI) systems to operate 10 times faster.
The researchers developed a software system called WaferLLM, designed specifically for wafer-scale chips to help improve their performance. Wafer-scale chips are the world's largest computer chip and are roughly the same size as a dinner plate.
"Wafer-scale computing has shown remarkable potential, but software has been the key barrier to putting it to work,” said Dr Luo Mai, lead researcher and reader at the University of Edinburgh's School of Informatics.
The process is based on new software that lets trained large language models (LLMs) draw conclusions from fresh data – a process called inference – in a much more efficient way.
“With WaferLLM, we show that the right software design can unlock that potential, delivering real gains in speed and energy efficiency for large language models,” said Dr Mai. “This is a step toward a new generation of AI infrastructure – one that can support real-time intelligence in science, healthcare, education, and everyday life.”
The researchers evaluated the software at EPCC, the UK’s National Supercomputing Centre based at the University of Edinburgh. The EPCC operates Europe's largest cluster of advanced Wafer Scale Engine processors and is also the future home of the UK’s next supercomputer, funded by the UK Government to the tune of £750m.
“Dr Mai’s work is truly ground-breaking and shows how the cost of inference can be massively reduced,” said Professor Mark Parsons, the director of EPCC.
Wafer-scale chips differ from typical AI chips not only in terms of size but also in how they operate. The larger chips have been designed to carry out many computation tasks simultaneously within a single chip, which is aided by massive on-chip memory.
With all the computation taking place on the same piece of silicon, data can move between different parts of the chip much faster than if it had to travel between separate groups of chips and memory via a network.
Because of this, a wafer-scale chip can integrate hundreds of thousands of computation cores all working in parallel, making it more efficient at completing the mathematical operations that power neural networks – the backbone of LLMs like ChatGPT.
The accelerated performance could have a major impact on industries that need LLMs to generate fresh insights in real-time in under a millisecond, such as chatbots, finance, healthcare, and scientific discovery, the researchers say.
Holyrood Newsletters
Holyrood provides comprehensive coverage of Scottish politics, offering award-winning reporting and analysis: Subscribe