Is AI infrastructure truly scalable for real-time conversational functionality? Well, maybe not with GPUs…
In a long-awaited ipXperience interview, Eamon chats with Mark Heaps from Groq, a company that is certainly shaking up the AI world with its new LPU architecture. So what is an LPU exactly?
As Mark explains, in a world of GPUs (Graphics Processing Units) running large-scale AI workloads in big datacentre-type installations, Groq’s LPUs (Language Processing Units) specialise in processing data in sequence, i.e. everything is done in an order, as opposed to in parallel like GPUs.
The reason why sequential operation is important for language processing AI applications like LLMs (Large Language Models) is that order is important for context – the 100th word requires the 99th word to have been generated for an AI generated story or conversation to make sense.
Of course, such LLMs do not require an LPU per se in order to run, but as more chips are deployed to work together on larger and larger AI workloads, the LPU has a distinct advantage over GPUs in terms of scalability.
As more and more GPUs are deployed to work on the same workload, the overall speed of response for large language models tapers off. In contrast, the LPU’s performance continues to scale linearly as more chips are added to the workload.
At a certain number of devices – depending on the workload – an LPU-supplemented datacentre overtakes the performance of a solely GPU-based datacentre in terms of response time – i.e. latency – for real-time LLM-based conversations.
As you can see, Groq’s devices operate best in the truly largest-scale generative AI applications.
Mark highlights that an online customer assistant on a website, for example, could use a Groq-LPU-based system to provide real-time human-like conversations with customers without the buffer time that makes you know that you are talking to a bot.
While the sequential processing nature of the LPU might make it seem like it does not lend itself to running multiple conversations at the same time, Groq proves this is not the case by letting you and thousands of others converse with a Groq datacentre by signing into its online platform.
This is just one way to test this technology, so for many engineers looking to add low-latency LLM AI functionality to their designs, a connection to GroqCloud is all your device will need.
Alternatively, you can build your own datacentre with Groq server cards based on its LPU technology, which may be a requirement for more data-sensitive applications like banking, for example.
Learn more about Groq’s LPU by following the link to the board page below, where you can fill out the application form to get started working with Groq for a next-level AI project.
Keep designing!
Love the recent introduction of AI into anything and everything? Check out ipXchange’s other AI-related content, including our thought piece that attempts to explain what really defines AI now that it’s a buzzword:
Hailo’s M.2 card does gen-AI at the edge at 3.5 W
Introducing Astra – Synaptics redefines edge AIoT
Ambiq Apollo510 MCU takes on next-level endpoint AI
How Ambient Scientific discovered economical AI