Do you have (or want) lots of connected devices in the field and need to easily keep track of their performance? What about shipping OTA updates with the confidence that they are working well? You need to check out Memfault and its observability platform!
In the first of a series of interviews and webinars with Memfault, ipXchange chats with firmware engineer Gillian Minnehan to learn about Memfault’s embedded device observability platform and what it offers to engineers building connected products with regular Over-The-Air (OTA) updates.
Best of all, Gillian takes us through a tour of this platform and shows us just how easy it is to monitor performance, stability, connectivity, battery life, the source of errors and reboots, and so much more for each device and software version.
You can try this platform for yourself using Memfault’s Sandbox demo, which includes options for a debug demo, an OTA update demo, or free reign to explore the platform.
What is Memfault?
As Gillian explains, Memfault helps embedded teams to find and observe the faults and performance issues faced by devices out in the field. Once the Memfault SDK is integrated into a product, it will then report its performance back to the Memfault cloud platform in real time for ultra-fine-detail insights into how the device is running.
By understanding a device’s performance in detail, firmware engineers can then adjust problematic code and fix issues via OTA firmware updates. Once the update is shipped, new issues can be identified and fixed with further updates.
If a device is unable to connect to the cloud for any reason, the Memfault SDK sends the performance report to on-device non-volatile memory. This report is then uploaded once a connection is reestablished, and it contains all the details of any crashes and what code was running at the time. This ensures a continuous performance record even when your devices aren’t behaving as they should.
Where can you use Memfault?
Memfault’s solution can be used to monitor a large number of devices, whether these are nodes in an IoT installation, consumer goods like smartwatches and Bluetooth earbuds, or medical devices. Memfault offers solutions for MCU, Linux, and Android systems – for the purpose of this conversation, we will be focussing on the MCU version of the platform – and supports most connectivity schemes, so long as you can gain access to the cloud. These include Wi-Fi, BLE, LTE, and even things like Iridium.
The only technical requirement is a minimum transmission unit of 9 bytes. Gillian highlights the ESP32 ecosystem as a popular choice for those wanting to test Memfault’s platform with hardware.
Why remote observability is key
In resource-constrained embedded applications – small battery, small memory, low bandwidth – it is very important to ensure optimum performance for the best user experience. When it comes to medical devices, for example, you also need to ensure that reliability is never compromised.
But once these devices are already out in the World, it’s not as simple as bringing it back to the lab for debugging. And you don’t want your device to be left unused or get bad reviews due to poor performance.
Memfault enables engineers to do that debugging from afar. As we see in the demo, this debugging environment has a familiar layout that reflects what an engineer would expect if the device was on their desk. Gillian also highlights crash reports and the ‘heartbeat’ of the device, which shows the metrics of the core and subsystems at regular intervals, as two key features that provide engineers with regular, detailed insights.
Why choose Memfault for observability
While some may be tempted to develop a similar platform to gain remote insights about their devices, Memfault has already done the hard work for you. Memfault brings an ultra-robust and streamlined solution that would usually take a large team to develop and manage. It also removes the need for additional customer reports and having the device on your desk to gain clear insights.
If you already have devices in the field, you can rebuild your firmware with the Memfault SDK integrated and deploy this in your next OTA update. Memfault works with a wide variety of existing cloud platforms to ensure you get the insights you need and an easy way to access them.
Demoing the platform
At around 17 minutes into the conversation, Gillian starts a tour of Memfault’s platform for MCU-based builds. With so many insights that could be discussed, this tour focusses on what a firmware engineer might want to look at when remotely debugging a device. The data shown in this demo is most useful within the MCU version of Memfault’s platform, but the overall feel and workflow is similar for the Linux and Android versions.
The device in question for Gillian’s demo is a Bluetooth-connected fitness tracker that is suffering from connectivity, stability, and battery life issues after an update. As shown by the graphs, this update has been rolled out to around 20% of a 700-device fleet. A staggered rollout enables engineers to observe the performance of a new software version for a few devices before mass rollout. In this case, that software version is not so stable, so this rollout was halted so that a new, more stable build could be shipped to the remaining fleet.
Gillian shows us the main insights that can be gained from Memfault’s observability platform, both at the fleet scale and at the scale of the individual device. These insights include:
- The software version
- The number of errors throughout the fleet
- The nature of these errors
- The number of reboots and reasons for reboots
- Stability of the BLE connection to a host device
- Stability of the device itself – i.e. percentage of operating time without errors
- Battery life of the device, with percentile spread across the fleet
- Saved views for sending to members of your team
- Individual-device debugger view, with individual error details/context*, crash information, and the state of the threads running during a crash
*context in this case means what the user was doing during the crash, such as going for a run.
As you can see, engineers can also comment within Memfault’s platform when developing an update. Once an update is shipped, you can go through the process again, find the faults, and continue to improve on the performance and features of your device.
Evaluating Memfault
As stated at the beginning of this piece, the easiest way to get started with this technology is to check out Memfault’s Sandbox. This demo environment is just like the one shown to us by Gillian and allows you to really get stuck in before building the SDK into your products.
To reiterate, while Memfault is best integrated into your firmware at the beginning of the design process, it is very possible to rebuild your firmware and deploy an OTA update with Memfault SDK as part of the new version.
We hope you’ve enjoyed this ipXperience / ipX Tutorial hybrid, and we look forward to bringing you more Memfault content soon.
Keep designing!