How to run GPU-level edge AI on a Cortex-M MCU

By Harry Forster

Evaluate this board

Find out more about this board.

Products

Published

11 July 2024

Written by Harry Forster

When many people think AI, they think expensive GPUs and high power consumption. But at the edge – i.e. without a cloud connection – that’s not really the best way of doing things, so NVIDIA has made a few new tools to shrink edge AI models to run on general-purpose MCUs…

In another exciting piece from Hardware Pioneers Max 2024, ipXchange chats with Amir from Edge Impulse for a real-world demonstration of how this master of edge AI can shrink GPU-level workloads to run on general-purpose microcontrollers.

Amir first shows us two hardware setups running the same image-classification AI task. One is an Advantech production-ready camera unit built around an NVIDIA Jetson Orin GPU. The other is a small Cortex-M33/55/85-class MCU board running with or without an Ethos micro NPU (Neural Processing Unit) for hardware acceleration.

Advantech AI camera versus Renesas Cortex-M85 MCU

The latter presents a far more power-efficient, lower-cost solution for edge-AI applications, but how do these setups differ in their performance?

Before we get onto that, Amir confirms that while the AI image classification is done on the device, the training of these model is still done in the cloud with NVIDIA GPUs, using the Edge Impulse platform.

NVIDIA’s open-source TAO toolkit provides the magic behind shrinking these large AI models before deploying them to hardware like general-purpose microcontrollers. Edge Impulse have now enabled NVIDIA TAO within its core AI model training platform.

Amir then takes us through an example real-world image classification model that has been shrunk using TAO. Here, a GPU-assisted camera setup has been trained to recognise pallets lifted by a forklift in an industrial setting. This is a $500, 20-W solution that Edge Impulse has managed to run on Renesas’ Cortex-M85 microcontroller, which costs far less and consumes very little power.

We can see that while the edge AI model does not run quite as smoothly as for the GPU-based solution, it does run well enough to identify the palettes once they are closer to the camera. If you add more data to this model, you can improve its performance so that it works better than well enough while remaining small and energy efficient.

NVIDIA Omniverse is a great platform for providing synthetic data to boost the performance of these edge AI workloads. By synthetic data, we mean data that is not required to be taken from the real world, like photos and videos. Animated virtual environments can be used to save development time and resources to train the edge AI models in a much more streamlined and self-contained way.

In short, Edge Impulse enables you to train AI models even when you don’t have access to real-world data and deploy them at the edge, thanks to NVIDIA Omniverse.

If you’ve got a commercial application and want to learn more about what Edge Impulse can offer, follow the link to the board page below, where you can apply for consultation and see what other edge-AI innovations this exciting company can offer.

Keep designing!

Love edge AI? Check out these interviews about some key chipsets that ipXchange is excited about:

How to slash power consumption for AI vision

50 TOPS at 5 W?! SiMa.ai’s MLSoC is an AI beast!

Hailo’s M.2 card does gen-AI at the edge at 3.5 W

Introducing Astra – Synaptics redefines edge AIoT

Always-on sensor-fusion AI with Syntiant’s NDP250

Apollo4 vs. Apollo5: Ambiq’s MCUs in (AI) action!