Skip to main content
SearchLoginLogin or Signup

Closing the Loop: Enabling User Feedback and Testing in Symbolic Music Generation through a Python Framework and Ableton Live Integration

Published onAug 29, 2023
Closing the Loop: Enabling User Feedback and Testing in Symbolic Music Generation through a Python Framework and Ableton Live Integration
·

Rui Guo Department of Music University of Sussex Brighton, UK

[email protected]

Abstract

Symbolic music generation systems have seen rapid advancements, encompassing various techniques such as harmonization, text-to-music, and music infilling. However, most of these systems lack a corresponding user interface, limiting accessibility and hindering potential applications for composers and AI tool users. This paper proposes a python framework and an Ableton Live plugin designed to bridge the gap between deep learning researchers and user interest in these tools. The python framework provides APIs for seamless communication with the Ableton Live plugin, and the model can call those API to with their model’s specific functions. The Ableton Live plugin enables user interaction through its interface, allowing parameter configuration and control over the generation process. By facilitating testing and user interaction, this framework aims to enhance collaboration and accelerate advancements in the field of symbolic music generation.

Introduction

With the rapid advancement of AI technology, deep generative models have found applications in various domains, including text, image, audio, and music generation. Symbolic music, a specific type of music representation, captures music as discrete symbols instead of continuous audio signals. Symbolic music representations are commonly used in music notation, analysis, and computational music research, as they enable precise description and manipulation of musical information.

Compared to acoustic music, symbolic music can be represented using event tokens or piano rolls, allowing for the adaptation and application of existing generative methods from NLP and image domains. Numerous research studies have focused on generating symbolic music for various purposes. Some representative tasks include:

  1. Generation from scratch: This involves generating music, often with a predefined genre. Examples include the Google Magenta Studio’s “generate 4 bar” [1] and LakhNES, which aims to generate multi-track game music [2].

  2. Harmonization of existing melodies: The Google Bach Doodle can generate a harmonic counterpart for a given melody in the style of Bach. Another work [3] generates chord progressions based on the input melody and a surprise contour.

  3. Music infilling: This task involves generating missing parts of music given context information. The missing part could be tracks, bars, or any section of the music [4, 5, 6].

Other tasks in symbolic music generation include lyrics-to-music [7], text-to-music [8], and generation with a specific theme [9]. Most of these tasks fall under the category of conditional music generation, where the generation process follows a specific condition. For example, generating a missing melody

4rd Conference on AI Music Creativity (AIMC 2023)

Figure 1: The architecture of the proposed framework

line that fits well with given bass and harmony, or generating a harmony line based on an input melody. In some systems, the condition can be some specific music attributes, such as key, tempo, track texture, bar tonal tension [5], or chord progression [10]. In such cases, the system requires additional input beyond just the music to condition the generation process.

In symbolic music generation systems, there is a rapid exploration of input representations and the use of conditions. However, there is currently no platform similar Hugging Face 1 that hosts music generation models. This is primarily due to the different representations of symbolic music across different models, as well as the need for visualization and synthesis of symbolic music for consumption. While most current methods provide webpages with generated music, they do not offer a user-friendly way to try the models with user input. Some models provide an interface for user exploration, but this interface design is often specific to each model, creating burdens for both researchers and end users. Researchers spend considerable time developing interfaces, and these interfaces may not be user-friendly or lack essential functions like note editing, particularly in models hosted on the Google Colab platform. Consequently, research in this field remains inaccessible to most end users, limiting the potential impact of the research. Moreover, without an easy-to-use interface, testing generative models is primarily implemented through direct coding rather than interactive means. As a result, the exploration of models by researchers themselves can be limited, preventing the exploration of different ways to manipulate the model.

Most of the research on music generation using deep learning provides GitHub links for code or notebooks. Some works offer Colab notebooks [11], primarily aimed at replicating the research. However, these notebooks often lack flexibility in editing the generated notes. Several works have developed web interfaces for their AI systems, covering tasks such as chorale inpainting [12, 4] and melody/harmonization [13]. The Google Magenta Studio [1] stands out as one of the earliest interfaces integrated into digital audio workstations (DAWs). As an Ableton Live plugin, it facilitates music generation, but is limited to monophonic music and lacks extensive user control over the generation process. Another Ableton Live plugin, based on a variational autoencoder, focuses on drum rhythm generation [14]. Additionally, a piano inpainting application for piano performances has been realised through an Ableton Live plugin [15]. Those works are specific to different applications and are not generic.

Method

To bridge the gap between symbolic music generation models and users, an interactive interface with music input, easy note editing, and parameter settings is necessary. Rather than building an entirely new music interface, leveraging existing digital audio workstations (DAWs) like Ableton Live or Logic Pro, which already have a large user base including generative model researchers, can provide a mature platform with a rich set of functionalities. In this research, Ableton Live is chosen due to its Max for Live (M4L) plugin system, which offers a JavaScript patch for sending HTTP requests and receiving responses from the server side. The JavaScript patch also enables the creation of dynamic interfaces, allowing for the addition of model-specific parameters to control the generation process. Ableton Live’s session view, which enables comparison of different versions of a generation, and the arrangement view, which provides a user-friendly environment for note editing, are both valuable and serve different purposes. Therefore, the M4L plugin should support both modes. Figure 1 shows the architecture of the proposed framework, which includes the python API and Ableton Live plugin.

1https://www.huggingface.co

The plugin should also provide various ways to select music sections. In the arrangement view, the user-selected region of notes can be sent back to the model, allowing for fine-grained selection of pitch, duration, velocity, and other information. Additionally, the plugin should offer options to select larger blocks of music, such as bars or tracks. In addition to traditional methods of selecting regions or setting parameters, an open text input is essential to accommodate information that is not easily expressed using other methods, expanding the possibilities for music creation.

On the model side, since most generative models use Python, a Python framework with a set of APIs to connect with the model is preferred. The API should include functions to send the generated results and parameters to the M4L plugin and subsequently to the interface. The API’s primary function is to convert MIDI files to note objects in Ableton Live and send them to the JavaScript code to relay those notes to the interface. Alongside notes, the API should support several parameters, including sections or other markers to indicate specific regions if only a part of the music is being modified. This functionality can be useful to avoid replacing the entire composition but only updating the newly generated section. Additionally, the API should accommodate model-specific parameters that align with the parameters set up in the interface. The aim is to minimise restrictions on the types of parameters available and maximise the flexibility of their usage.

Example

The author has developed an M4L plugin to connect a music infilling model with the Ableton Live interface. The interface is illustrated in Figure 2, where the upper part shows the arrangement view of the input MIDI, and the lower part displays the M4L plugin. The plugin provides several parameter selections, including the section to infill, track and bar-level texture control settings, and the bar tonal tension setting. These control parameters are implemented using the “umenu” and “multislider” patches in M4L.

To utilise this interface, the user first enters the IP address and establishes a connection with the model server. Then, an input MIDI file is selected and imported into the arrangement view. The “show info” button displays the number of bars and some information about the MIDI file, which is calculated locally and not by the generative model. The “control calculation” and “infill” buttons trigger different functions on the server side, which need to be set up using the API. The control parameters in this example can be easily adapted to suit the parameters of different models. They allow for adjustments to the interface to align with the specific parameter requirements of each model. This specific example is a baseline for the framework, and it will expanded for the generic cases

A demonstration of this interface for music infilling is shown below:

Conclusion

The rapid development of AI technology has resulted in the emergence of versatile generative models in the field of symbolic music generation. However, these models are not fully utilised without an interactive interface showcasing their advantages. Moreover, the absence of user feedback from music professionals or amateur composers can limit the discovery of model weaknesses and hinder improvements for future research. To bridge the gap between AI research and user interaction, this work proposes a Python framework and an Ableton Live plugin to integrate models into the Ableton Live interface.

The primary objective is to facilitate seamless integration and adaptability of the interface to various usage scenarios. Users can customise the interface using model-specific parameters and map their model’s functions to the interface through the provided API. The interface also offers flexible note selection methods, accommodating both arrangement view and session view usage. Ultimately, the goal is to create a platform similar to Hugging Face, where researchers can openly share their models along with their adapted M4L plugins, allowing users to easily utilise these models. This platform aims to facilitate model development, evaluation, and the formation of an open community for sharing and leveraging research in the field of symbolic music generation.

Figure 2: The Ableton Live interface with the M4L plugin for an infilling application.

References

Proceedings, pages 341–356. Springer, 2022.

  1. Jeff Ens and Philippe Pasquier. Mmm: Exploring conditional multi-track music generation with the transformer. arXiv preprint arXiv:2008.06048, 2020.

  2. Zhe Zhang, Yi Yu, and Atsuhiro Takasu. Controllable lyrics-to-melody generation. arXiv preprint arXiv:2306.02613, 2023.

  3. Peiling Lu, Xin Xu, Chenfei Kang, Botao Yu, Chengyi Xing, Xu Tan, and Jiang Bian. Musecoco: Generating symbolic music from text. arXiv preprint arXiv:2306.00110, 2023.

  4. Yi-Jen Shih, Shih-Lun Wu, Frank Zalkow, Meinard Muller, and Yi-Hsuan Yang. Theme transformer: Symbolic music generation with theme-conditioned transformer. IEEE Transactions on Multimedia, 2022.

  5. Kyoyun Choi, Jonggwon Park, Wan Heo, Sungwook Jeon, and Jonghun Park. Chord conditioned melody generation with transformer based decoders. IEEE Access, 9:42071–42080, 2021.

  6. Hao-Wen Dong, Wen-Yi Hsiao, Li-Chia Yang, and Yi-Hsuan Yang. Musegan: Multi-track sequential generative adversarial networks for symbolic music generation and accompaniment. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1), Apr. 2018.

  7. Théis Bazin and Gaëtan Hadjeres. Nonoto: A model-agnostic web interface for interactive music composition by inpainting. arXiv preprint arXiv:1907.10380, 2019.

  8. Andrew Shaw. Musicautobot. https://github.com/bearpelican/musicautobot, 2020.

  9. Nao Tokui. Towards democratizing music production with ai-design of variational autoencoder-based rhythm generator as a daw plugin. arXiv preprint arXiv:2004.01525, 2020.

  10. Gaëtan Hadjeres and Léopold Crestel. The piano inpainting application. arXiv preprint arXiv:2107.05944, 2021.

Comments
0
comment
No comments here
Why not start the discussion?