Skip to main content
SearchLoginLogin or Signup

Bela-IREE: An Approach to Embedded Machine Learning for Real-Time Music Interaction

Published onAug 29, 2023
Bela-IREE: An Approach to Embedded Machine Learning for Real-Time Music Interaction
·

Abstract

Real-time artificial intelligence and machine learning processes are increasingly prevalent in creative technology practices, including in digital musical instrument (DMI) design and music interaction more broadly. However, achieving suitable performance in an embedded systems context, which is ideal for portable and self-contained instruments and interfaces, remains a challenge, with many different frameworks and approaches being explored simultaneously. In this work we explored the potential of combining the Intermediate Representation Execution Environment (IREE), part of the OpenXLA ecosystem, with the Bela embedded maker platform. We present a workflow and numerous tools for investigating the combination of Bela and IREE, including a virtualised container environment, an IREE runtime for Bela and an embedded model zoo. We report on the challenges of profiling embedded machine learning models, present initial benchmarking results, and conclude with future work towards a usable pipeline.

Introduction

This project took place over the summer of 2022 as part of Google Summer of Code1 with the BeagleBoard Foundation2, Bela3 and the Intelligent Instruments Lab4. The project's objective was to improve the tooling available for those looking to use machine learning models in their Bela projects. Bela [1] is a maker platform built on top of the BeagleBone Black (BBB) with a focus on real-time audio and sensor processing for use in interactive art projects including digital musical instrument (DMI) design. The availability of machine learning tools to be used with Bela would allow for new design practices incorporating machine learning models [2]. The original goal of this project arose out of the constraints of Bela's low-powered (in machine learning terms) processor and the real-time constraints of interactive projects. To aid development on this platform there is a need for performance analysis tools that allow for quick evaluation of different models on Bela. This project has built some tools that can be used for this purpose including benchmarking and profiling utilities.

This project aims to provide multiple benefits. Firstly, it aims to improve the development ecosystem surrounding Bela by providing a new tool to measure the performance of different models. This will help those researching ML for use in embedded DMI design speed up their iteration cycle by providing measurements directly from the target hardware. In tandem with this benefit, the project also aims to provide researchers and developers with the ability to dive deeper down into the details of their implementation and examine the potential bottlenecks on a CPU-cycle by CPU-cycle basis. This would greatly improve the understanding of what types of model architectures could be possible on this platform, maximise the available computational resources on the BBB and motivate future optimisation work. Finally, this project aims to improve access to embedded ML on the BBB/Bela and potential future platforms like the BBAI645. This will benefit instrument designers, artists and makers by providing them with example projects and documented tools, enabling new explorations of the applications of embedded machine learning.

While embedded systems developers are used to working with low-level systems languages like C++, machine learning developers are used to high-level scripting languages like Python and Julia, with exemplary domain specific packages and documentation. Flexible modelling languages like PyTorch6 and JAX7, which let researchers iterate quickly at multiple levels of abstraction, have been critical to recent advances in machine learning. Being able to define and train models in a framework like PyTorch and move them quickly onto Bela, with visibility into their performance characteristics, would vastly accelerate efforts to use deep learning in an embedded musical context. In addition, programming embedded systems for real-time musical interaction is extremely difficult, despite the existence of ground-breaking projects like Bela. Embedded machine learning has the potential make embedded DMI programming more accessible by taking advantage of text-to-code [3], interactive machine learning [2] and explainable AI (XAI) [4] techniques. This project and others like it represent the first major steps towards unlocking this potential.

Background

Embedded Machine Learning for Music Interaction

The usage of machine learning in instrument design has grown in recent years (see for example [5] [6]), yet there have not been many comprehensive implementations in resource-constrained embedded contexts like Bela [7]. This can be attributed to the fact that machine learning can be very computationally expensive, with many typical applications requiring GPUs, TPUs or other custom hardware accelerators. However, with the growing industry interest in edge computing, there have been increasing numbers of projects looking to optimise the whole machine learning pipeline for embedded devices, such as TinyML [8], TensorFlow Lite [9] and many others8. A recent comparison of deep learning inference engines for embedded real-time audio classification [10] demonstrated the viability of several approaches, but surfaced a tradeoff between generality of framework and computational overhead on the one hand, with more efficient but less flexible specialised frameworks on the other. This project aims to leverage tools such as these to give Bela users the ability to deploy ML models to their devices, aiming for a sweet spot somewhere in the middle.

The main constraint in doing so is the real-time nature of audio projects on platforms like Bela, which is a key factor when developing instruments or interactive sensor systems [11]. This imposes a latency requirement on any models being run which is grounded in music perception and cognition experiments, such that real-time interactions feel as responsive, typically necessitating latencies of 5-10ms maximum [12]. This strict latency requirement implies the need for performance analysis tools that can evaluate and measure ML models, providing feedback to the user on the runtime costs incurred by their models. Thus, this project's focus is the development of performance analysis tools for running machine learning models on the Bela, in the form of a benchmarking tool and a profiling tool. The benchmarking tool is intended for measuring latency, memory and accuracy measurements, and meant to be used when comparing different ML runtime components, model architectures and/or compilers. The profiler is intended to be used to pinpoint bottlenecks during model development, allowing developers to discover slow operators and view CPU utilisation.

Bela Platform for Embedded Interactive Arts Projects

Bela is an open source platform built upon the BBB, consisting of an audio cape and a custom real-time Linux image using the Xenomai framework [1]. Bela provides a low-latency computing environment ideal for use in audio applications. The design philosophy of Bela aims for both high-performance and high usability [13], and supports programming in C++, SuperCollider, Pure Data, CSound, RNBO, Faust and Pyo9. Bela has been used extensively in research about perception of latency [12], e-textiles [14], and more10. There already exists a large community surrounding Bela11 [15], as it is an increasingly popular platform for use in educational settings as well as DMI design and maker communities. As the Bela platform has been adopted by a wide range of users, from artists to engineers, this project aims to provide tooling that caters to this broad user base.

Real-time performance monitoring on Bela can be approached in different ways12. The main consideration is that time-critical tasks always run in primary (Xenomai) mode and never in secondary (kernel) mode or switch back and forth. These primary and secondary modes have different constraints in terms of profiling. Overall CPU performance on Bela is usually computed by taking primary mode data from /proc/xenomai/sched/stat and combining this with secondary mode data from /proc/$PID/stat. Embedded DMIs that use ML must have robust strategies for navigating this primary versus secondary boundary in terms of where and when computation happens, otherwise latency will be degraded.

Intermediate Representation Execution Environment (IREE)

After investigating various other possible embedded ML pipelines, this project focused on supporting the Intermediate Representation Execution Environment (IREE) [16] on Bela. We chose IREE partly because other early experiments in this area had not yet done so13, but also for intrinsic reasons as well. IREE is part of the OpenXLA14 project, itself an industry-backed, open source ML compiler ecosystem. IREE leverages the Multi-Level Intermediate Representation (MLIR)15 [17] compiler infrastructure project, which is itself part of LLVM16. MLIR includes a machine learning compiler and runtime, and is based on dialects, which are themselves collections of operations. The passes within MLIR compilers translate between dialects. MLIR infrastructure allows different compilers to reuse optimisations and compiler passes on and in between new dialects without starting from scratch.

IREE uses MLIR to create a compiler that spans down to scheduling workloads, with a lightweight Hardware Abstraction Layer (HAL) to run on. IREE allows for multiple different code generation backends from the compiler including portable IREE virtual machine bytecode, LLVM IR or C source code. The main advantages of IREE are the portability across different hardware platforms including bare-metal options, option for parallelisation on platforms where it is an option, as well as the multiple frontends available for importing models (although some are in the very early stages, e.g. Torch-MLIR17). Aside from Bela, the IREE runtime could be a lightweight way of running machine learning models in different types of audio (or other multimedia) projects (e.g. Pure Data, VST, CLAP, etc.) while still being able to take advantage of larger multiprocessor systems.

Bela-IREE Workflow and Tools

Bela-IREE Container and Workflow Example

bela-iree-container18 is a Docker image that contains a toolchain setup for compiling IREE projects for Bela, that also contains some utilities to compile, benchmark and profile programs. It can be set up and launched from within Visual Studio Code and requires a Bela to be connected. It is based on the foundational work of Rodrigo Diaz19. This project also contains the runtime and model zoo described below as git submodules, enabling an end-to-end development environment for using IREE in Bela projects.

As an example, once the container is setup, these are the steps needed to benchmark a single matrix multiply (single_mm) using IREE on Bela as well as print out some profiling information. These steps assume you are in the bela-iree-container directory:

cd /workspaces/bela-iree-container/models/embedded-model-zoo/
conda activate zoo
python -m zoo
cd tosa
compile -i single_mm.tosa -t bbb -d tosa -f vm-bytecode -h llvm-cpu -o single_mm.vmfb
benchmark -f single_mm.vmfb -t bbb -r 10 -e forward -i 1x1024xf32=4 -d 192.168.7.2
profile -f single_mm.vmfb -m stat -e forward -i 1x1024xf32=4 -l cache-misses,cache-references -d 192.168.7.2

Running the zoo module will export the models in the zoo to various formats, including in this case the Tensor Operator Set Architecture (TOSA) MLIR dialect20 via Torch-MLIR. This model is then compiled into an IREE VM Program for direct execution by the IREE runtime. These programs can either be serialised VM Flatbuffers (.vmfb), or emitted C code where a runtime is not desirable (see runtime section below). The IREE program can finally be benchmarked using the iree-benchmark-module, and profiled using tools like Tracy and perf (see benchmarking section below).

Bela-IREE Runtime

bela-iree-runtime21 contains a Bela C++ project22 with the IREE runtime setup to allow for a model to be loaded into a Bela project. This template project preserves the typical setup() and render() program structure familiar to creative coders and users of Arduino, Processing/p5js, Bela and similar. The runtime has two branches with different project structure. The main branch requires the IREE compiler to export a VMFB file to the Bela, which will be then loaded at runtime. Alternatively, there is the EmitC option on the emit-c branch. This option requires the IREE compiler to output C source code in a module.c file to be compiled into the binary ahead of time. The runtime also has the option to enable Xenomai diagnostics during runtime to inspect how the IREE thread is behaving. It is functional but still in the early stages; we plan on improving it further so it is easier to use.

Embedded Machine Learning Model Zoo

A goal of this project is to make it easy for developers experiment rapidly with embedded ML on the Bela in the same way they would using a framework like PyTorch on a laptop. To illustrate such a workflow, we created embedded-model-zoo23 , a python package which automates the export of PyTorch models to the various formats (tflite, torchscript, TOSA, ONNX) ingested by the IREE compiler. It also contains the ‘zoo’ itself, a diverse collection of simple PyTorch modules which represent common building-blocks for deep learning. These include:

basic_mlp

A typical multilayer perceptron (MLP) model.

siren_mlp

SIREN-style [18] MLP with sinusoidal activations and reshaping ops.

simple_conv_1d

Convolutional network [19] for one-dimensional inputs, including average-pooling ops.

resnet_1d

Convolutional network with residual connections [20], dropout [21] and softmax operations.

simple_rnn

Elman [22] recurrent neural network (RNN).

log_spectrum

Power spectrum computed using discrete-time Fourier transform and element-wise math ops

transformer_block

Wrapper around a standard PyTorch transformer [23] encoder layer.

variational_encoder

A tiny VAE [24] encoder including pseudorandom number generation.

We call this a ‘zoo’ by analogy to collections of deep learning models like the ONNX model zoo24, but note that ours is a ‘zoo’ of model architectures, not trained models. It serves three intended purposes. First, to provide a variety of model graphs for testing the runtime, profiling and benchmarking components of this project. Second, to compare support and efficiency of specific operations between runtimes and export paths. Third, to provide an easy starting point for developers of more useful models. Using embedded-model-zoo, a developer can continuously monitor performance and support of their evolving designs, as an alternative to restricting themselves to a known-to-work architecture or developing a new architecture blindly and needing to redesign it for embedding later.

Results

Initial Benchmarking

Here we report benchmarks obtained from running the model zoo models on the BBB and BBAI64 (CPU only). The Cortex-A8 on the Bela is clearly slower at inference than the AArch64 Cortex-A73 on the BBAI64. There were also more errors encountered running models on the BBB’s 32-bit ARM platform when using the LLVM-CPU code generation backend. The interpreted VMVX runtime is still quite new and is expected to be improved as the IREE developers begin to add new microkernels to VMVX25, which will hopefully translate to new performance gains on Bela. All models in this table other than the MDRNN [25] were processing blocks of 1024 samples.

Model

IREE input type

Bela-IREE Benchmark

BBAI64 (CPU only) IREE Benchmark - LLVM-CPU

basic_mlp_1024

TOSA from TFLite

222ms (VMVX)

24.0ms

resnet_1d_1024

TOSA from TFLite

segfault

NA

simple_conv_1d_1024

TOSA from TFLite

2549ms (LLVM-CPU)

137ms

simple_rnn_1024

NA

NA - unable to export to TOSA

NA

single_mm_1024

TOSA from Torch-MLIR

19.7ms (LLVM-CPU)

7.72ms

siren_mlp_1024

TOSA from TFLite

778ms (LLVM-CPU)

50.4ms

transformer_block_1024

TOSA from TFLite

Segmentation fault

142ms

variational_encoder_1024

NA

NA - unable to export to TOSA

NA

mdrnn (64 hidden units)

MHLO from JAX26

37.6ms (VMVX)

0.176ms

It is difficult to compare these results with others such as those referenced earlier [10] since different hardware and models were used; indeed this is a problem for the field in general which a common model zoo such as ours could help to address. However in general we can say that there is promise for Bela-IREE to achieve real-time performance, but for now only at control rate or lower sampling rates for DSP. The clear difference between the BBB and BBAI64 also indicates that CPU capability is for now one of the most prominent bottlenecks, for projects seeking an end-to-end workflow that begins with high-level ML frameworks. CPU capabilities notwithstanding, at this point more work is needed to estimate when a threshold may be crossed in the MLIR ecosystem that takes approaches like these from being barely to fully viable.

Performance Profiling

Another benefit of using IREE is the built-in instrumentation using the Tracy profiler27. This profiler can be enabled throughout the IREE runtime with a compiler flag, allowing for fine-grained profiling data from running IREE programs. The data can be sent over TCP to a capture tool which allows for visualisation of traces, memory allocations, etc. Unfortunately, the instrumented binaries are currently somewhat unstable on the Bela making for unreliable profile recording, although some profiles could be recorded. More work could be done here debugging the cause of the instability as it could be a very useful tool and it is somewhat functional.

Tracy example from Bela, when it was able to momentarily run.

As an alternative the Linux perf profiler28 was used to record profiles and performance monitor events on the Bela while models are running. The profiling utility in the Docker container allows for recording profiles as the model runs. The profiles and events can then be viewed in various formats for example using TraceCompass29. Additional work could be done in the IREE runtime on Bela to provide similar instrumentation to the Tracy profiler, this could possibly be done with the LTTng30 tool to insert tracepoints.

Future Work

Our work provides a solid starting point for future experimentation with IREE on Bela and other embedded devices (Raspberry Pi, Teensy, Elk, Jetson Nano, etc.). We plan on continuing this work and will be especially focused on improving the IREE runtime on Bela so that is stable and easier to use. In the short term we plan on getting a full demo of a control-rate MDRNN running in a Bela project using IREE. In addition to more practical improvements we also would like to automate some of the process of building the IREE runtime components, profiling/instrumentation utilities, and MLIR components so that the upstream projects can be easily tracked, to keep up with rapid upstream developments.

We believe adding to the model zoo can only benefit other projects in this space and invite developer contributions. We also encourage proposals for standardised “benchmark tasks” for embedded DMIs, similar to challenges in other fields such as music information retrieval (MIR). In terms of profiling, we seek to address the issues uncovered with the Tracy profiler, and eventually to be able to provide developers with interactive flame graph visualisations31, which could perhaps be eventually integrated into the Bela IDE [26] as an extension.

Another interesting area to investigate in the future would be the GPU on the Bela and similar devices. There were GPU drivers finally released by PowerVR (GPU manufacturer) in 2020 for the GPU on the Bela32. It may not prove useful for audio synthesis as it would still be using large block sizes. However, it may be useful for offloading some control-rate processing, or indeed image processing, and any DMI design problem that can be turned into an image processing one.

In terms of end-user applications and usability, this project in future aims to develop some example projects in Python and for Bela. These projects will serve as a learning tool for people looking to explore embedded ML on Bela. They will cover the installation, configuration and use of the relevant tools as well as provide example code for building, training and running ML models on Bela. They will also provide documentation for the use of the tools developed during this project. Based on the preliminary results recorded during this project, more specific example projects could be developed to target some potential use cases such as gesture recognition and mapping, neural audio synthesis or dimensionality reduction of incoming sensor data. These more targeted example projects would be valuable in providing concrete, applicable examples for the Bela community, inspiring new ideas and further development. The development of example projects will also allow for the refinement of the performance analysis tools as it may inform the development of new features when used in real-world practice.

Ethics Statement

This research was carried out under guidance of the Intelligent Instruments Lab project, and as such was subject to ethical standards of Iceland University of the Arts and European Research Council funding body. No human subjects activities or studies were carried out as part of this research. The project followed all of the Google Summer of Code guidelines and processes for both Contributors and Mentors, including weekly meetings and reports.

The authors acknowledge that AI/ML technologies even when applied to musical instruments have the capacity to cause social and cultural harm, and anticipate that future work in this space will need to take this further into consideration, and publish model cards and other additional reportage concerning ethics as needed.

Acknowledgements

Ezra Pierce carried out the development work of Bela-IREE as part of a Google Summer of Code 2022 project in partnership with the BeagleBoard Foundation and Bela.io. Victor Shepardson was a secondary mentor to Ezra Pierce, and created the embedded model zoo. Jack Armitage was the primary mentor to Ezra Pierce, and provided expertise to Ezra Pierce on the Bela platform, and edited the paper. Thor Magnusson is the PI of the research project.

Thanks to our partners for support throughout this project; Jason Kridner and Cathy Wicks at BeagleBoard, Andrew McPherson and Giulio Moro at Bela, and Google Summer of Code.

The Intelligent Instruments project (INTENT) is funded by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant agreement No. 101001848).

Comments
0
comment
No comments here
Why not start the discussion?