FPGA-Based Neural Networks Bosnian Speech-to-Text

Last week, Eamon sat down with Isam Vrce for ipXchange’s first (published) ProjeX interview. Originally, we had planned to partially use this content to promote how ipXchange helped Chili.CHIPS*ba to discover the FPGA solutions that were used in this project, but not being part of that process, Isam could not comment on such matters.

Instead, ipXchange dived deep into how Isam developed FPGA-based neural networks for the purpose of Bosnian speech-to-text, the application aiming to help people who may require the assistance of voice commands in their daily life.

Below, you can find a detailed article on this project in Isam’s own words. We thoroughly hope you enjoy this interview, and we can’t wait to share more.

Keep designing!

An introduction to yourself

I’m Isam Vrce, freshly graduated with a Master’s in Electrical Engineering from the University of Sarajevo. I’m all about enthusiasm, blending hobbies with my academic journey to spice up the learning experience. I’m driven by curiosity, constantly on the look to learn and forge new friendships.

My academic path took an exciting turn when I met the founder of Chili.CHIPS*ba, through my professor / mentor. This led to the formation of the ETFPGA club at our faculty, a collaborative group of students with love for electronics. We engage in activities such as crafting our own RISC-V processor, tinkering with image and sound processing, or designing our PCBs.

Among our projects, TetrisSaraj stands out – A fun, FPGA-based twist on the classic Tetris game, showcasing our skills in a playful way. And there’s “TISG Trustable Image Sensor Gateway,” our venture into open-source innovation, aiming to revolutionize FPGA-based camera systems with broad sensor compatibility via the MIPI-CSI2 standard.

Working with the ETFPGA club has not only been about technical skills; it’s been a journey of practical learning, teamwork, and applying our classroom knowledge to real-world scenarios.

Why did the project happen?

Picture this: You’re in an elevator, hands full of bags, struggling to press the correct button. Or imagine the everyday hurdles faced by individuals with disabilities simply trying to reach any button. That’s where my idea clicked – Why not simplify our life with voice commands?!

Not a biggie, one would say – That’s already done. Except that we are talking about Bosnian language, spoken by fewer than 5 million, and definitely not on par with English, Chinese or German, where multiple ready-made solutions are already in place.

This idea extends to a vision that reaches far beyond the elevators, looking to change how we interact with our homes and make technology more accessible, all in my native voice. I have therefore set out to create a machine for interpreting simple Bosnian spoken words, without relying on the Internet. Moreover, for accessibility, it had to be in an IOT format, as a handheld and battery-powered device. It was also envisioned as a stepping stone for my colleagues and prospective innovators, to further enhance this IP block, as well as build other products with it.

The Components/Technology Used

Given that the primary design goal was accessibility in terms of both material cost and power budget, I selected a modest FPGA that, despite its mere 9K LUTs and relatively small BRAM capacity, didn’t compromise on performance. The HW kit also included an external audio CODEC and LCD screen, which served to visualize and print the machine-recognized spoken words.

Written in SystemVerilog RTL, the project did something special outside of its main focus on DSP. To answer the challenge of how to rapidly adapt the audio CODEC functionality, I came up with a custom “programming language” for translation of human-readable instructions into machine code, stored on the FPGA. This approach allowed me to adjust parameters of the audio card with ease.

The Problems you Faced

As expected, a major challenge was the minor role of Bosnian language in world affairs, leading to the scarcity of datasets for key word recognition. This demanded a proactive approach to gather the necessary data, ensuring the system could classify words accurately.

Furthermore, the process of feature extraction on the FPGA, the calculation of Mel Frequency Cepstral Coefficients (MFCCs), posed its own set of challenges. Achieving the right balance between precision and resource efficiency (recall, I had only 9K LUTs to work with) demanded careful consideration.

Real-Time Processing Constraints: The need to process audio signals without observable latency, while maintaining system responsiveness and throughput, all in a resource and power-constrained setting.

The Solutions to Them

The approach to overcome these hurdles was marked by iterative refinement and collaboration. To tackle the task of data collection, I crafted a MATLAB script that I shared with my colleagues. They conducted “interviews” with their family, effectively broadening our dataset without the need for me to record hundreds of individuals directly.

To ensure the accuracy of the results, I used a home-grown process, starting from algorithm development in Python and MATLAB using floating-point arithmetic, then reducing it to the fixed-point arithmetic that can be directly replicated in FPGA gates. This method enabled me to compare the output from the fixed-point version with the results obtained using floating-point math.

Once satisfied with the results, I translated the validated algorithm into SystemVerilog, adapting it for implementation on the FPGA. This ensured that, when deployed on the FPGA, the algorithm would produce accurate results.

I previously overlooked mentioning a challenge related to PCB design. The project consists of three separate hardware components which, as you can imagine, involves a significant amount of wiring. It would be beneficial to design a custom-made PCB that could neatly integrate all these components into one cohesive unit. Such a PCB would greatly simplify development on this platform. Naturally, creating this custom PCB is on our agenda to tackle next.

Advice to Design Engineers Working on a Similar Project

For design engineers venturing into uncharted territories, like it was for me when I jumped from Embedded & C to FPGA & SV, I’d like to emphasize the importance of tackling challenges with patience and persistence. It’s crucial to have a collaboration with your colleagues. Don’t shy away from trying new solutions and methodologies. I advise an iterative, step-by-step methodical approach to problem-solving, where each step is a learning opportunity.

And remember, a river is a mere spring at the start of its journey. And it is not guaranteed that it will become an ocean at the end 🙂