Audio scene classification using machine learning on STM32 2022-10-18

Audio scene classification (ASC) can make objects smarter and allow them to be aware of user environments. This can add new levels of functionality and user experience in wearables, safety, environmental monitoring, healthcare, and many other applications. The challenge is to simplify software development and hardware design, especially for portable or wearable devices with processing, memory, and power constraints. Our solution addresses cost and design considerations by leveraging Artificial Intelligence on cost-effective, ultra-low-power STM32 microcontrollers.

The system captures ambient sound from two MP34DT05 digital MEMS omnidirectional microphones to ensure accurate acoustic sensing and efficient processing by the STM32L475VG microcontroller. This ultra-low-power MCU features signal processing peripherals and a floating-point unit (FPU) for rapid AI software execution. The FP-AI-SENSING software configures the solution for ASC involving neural network libraries generated by the X-CUBE-AI extension for STM32CubeMX.

The STM32WB55VGY provides ultra-low-power wireless connectivity compliant with the Bluetooth® Low Energy SIG specification 5.2. The algorithm outputs can be transmitted via Bluetooth to a smartphone with suitable app, such as the ST BLE Sensor app (ver. 4.1.0 or higher) for Android and iOS devices. This app can display resulting acoustic scene classifications and inferences, as well as activate data logging on the ASC system for AI retraining purposes.

The result is a cost-effective and low-power solution for audio scene classification based on AI neural network technology. It allows users with the smartphone app to see the environment recognized (e.g., indoor, outdoor, in vehicle, etc.) from environmental audio data.

Key Product Benefits

This microcontroller with Arm® Cortex®-M4 core has the necessary peripherals to manage incoming digital audio signals and the processing power and memory to ensure rapid and accurate audio scene recognition with minimal power consumption.

This MEMS digital microphone features high omnidirectional sensitivity and high acoustic overload point for audio acquisition without distortion even in noisy environments.

This highly compact and ultra-low-power wireless module is compliant with Bluetooth® Low Energy SIG specification v5.2 and is supplied with royalty-free protocol stack.



All Features

  • Ready-to-use firmware featuring an artificial neural network (ANN) implementation for real-time audio scene classification
  • The edge processing approach ensures lower power consumption and latencies than centralized cloud solutions, and provides greater privacy in audio (and image) based applications
  • Ultra-low power implementation based on the use of a real-time operating system (RTOS)
  • Compatible with ST BLE Sensor application for Android/iOS, to display recognized audio scenes and to manage data logging
  • Easy portability across different MCU families, thanks to STM32Cube
  • Compliant with the Bluetooth® Low Energy (BLE) SIG specification v5.2


Kit Description


This evaluation setup is based on the B-L475E-IOT01A Discovery kit with STM32L4 microcontroller running FP-AI-SENSING1 function pack software. The FP-AI-SENSING1 function pack for STM32Cube performs edge-based audio scene classification (ASC) based on outputs generated by neural networks (NN). The AI model is generated and optimized using the X-CUBE-AI extension for the STM32CubeMX tool. The ST BLE Sensor smartphone application (Android or iOS) completes the setup to manage the data collection and to display the recognized audio scene on a cell phone.

The ASC configuration captures audio using the on-board MP34DT05 digital MEMS microphone MEMS microphone. The Artificial Neural Network (ANN) does not require external memory, as it occupies only 18KB of RAM and 31KB of Flash. Audio samples are accumulated in a buffer and injected into the ASC preprocessing phase. The preprocessing phase extracts audio features into a spectrogram and implements Fast Fourier transforms (FFT) and filter bank applications followed by log scaling. The result is fed into the ASC convolutional neural network, which proceeds to classify the output labels as either indoor, outdoor, or in-vehicle at a rate of one per second.

Please choose the correct board for your development needs.


Board Frequency Band

B-L475E-IOT01A Sub-GHz (915 MHz) RF module


B-L475E-IOT01A2 Sub-GHz (868 MHz) RF module

Get the Software

All Evaluation Features

    • Ready-to-use audio scene classification based on neural networks
    • Able to recognize the following environments:
    • Indoor
    • Outdoor
    • In vehicle
    • The package comes with a utility for data logging and annotations on an SD card
    • Compliant with the Bluetooth® Low Energy (BLE)



Microcontrollers & Microprocessors

Part number Description
STM32L475VG Ultra-low-power with FPU Arm Cortex-M4 MCU 80 MHz with 1 Mbyte of Flash memory, USB OTG, DFSDM

  • 暂无评论
  Code: captcha