Skip to main content
SearchLoginLogin or Signup

Sandy Island: a new form of parameters’ space management

Published onAug 29, 2023
Sandy Island: a new form of parameters’ space management
·

Sandy Island: a new form of parameters’ space management

Abstract

Sandy Island is an adaptive audio and video site-specific installation the author presented in 2022. The work has been made possible thanks to a new formulation of the synthesis space exploration task, based on an agnostic representation of parameters of aggregate instruments as points in a three-dimensional virtual space. I focus on the underlying technology of the work and thus on the flexibility of the system and its ability to morph and self-generate new instruments’ configurations with desired characteristics. I stress the advantage of such approach presenting its practical application and suggesting new ones. The work is part of the author’s data-driven research project inspired by working in public spaces creating self-evolving multimedia environments which exhibit an adaptive behaviour to the surrounding external conditions.

Context

The system I am going to present has been developed to address the management of huge sets of parameters necessary to control the generation of audio and visual content of Sandy Island. Sandy Island is an adaptive installation1 which ran continuously for sixty hours at Hochschule für Musik und Theater Hamburg’s Forum (HfMT), from December 1st to 3rd 2022.2 The work refers to an artistic practice which I defined as aural space augmentation; this consists in the introduction of modal analysis techniques3 into the creative process:

The goal is not to create a faithful representation of the physical environment [that hosts the work] from a sonic point of view, but to augment the space of sound […] perception by superimposing on the physical characteristics of the environment, a simulation of it that preserves the same complexity of the observed context (Anatrini, 2022).

The subject of the work is a non-existent island that was charted until November 2012 when the Australian research vessel Southern Surveyor passed through the area and undiscovered the island.4 On a conceptual level, the theatre, which is the place of the fiction par excellence, has been chosen to stage the story of the island: a paradoxical encounter with a place that is also the projection of an absence, where the visitor is the protagonist of this mise-en-scene. The audience become part of this artificial ecosystem that is inevitably influenced by the human presence, thus building a path based on the shared memory of a collective hallucination, the only tangible trace of something that never existed.

The setup of the work consists in a wave field synthesis system of 288 diffusion points,5 mounted on a rectangular truss of 11x6 metres at about two metres from the ground, 240 LED panels of 50x50 cm (192 pixel) each. Those panels are mounted below the speakers to cover the space between those and the ground in order to create an immersive environment. Two laptops and two desktop computers have been used to control the system and a floor projection.

Fig.1 The installation under construction.

Fig.2 Sandy island on the first day.

The control signals of the audio and video synthesis are sent via OSC and MIDI messages to an Ableton Live set and to a TouchDesigner patch respectively. This latter one sends out a 2k video matrix to a controller linked to batches of LED panels. The audio signal, constituted of twenty-four channels, is sent via an ethernet cable to another desktop computer running a spat-based6 Max patch in charge of the sound diffusion. The audio has been managed via a Dante network thanks to two Dante cards installed on one of the desktop computers. Eventually the signals of two microphones hanged on the ceiling are analysed via audio descriptors to constantly shape the evolution of the work.

Fig.3 Setup and routing scheme of Sandy Island.

The audio and visual material at the core of the work is an emanation of the place that hosts it,7 for example the harmonic palette of the work is directly derived from the modal analysis. Concerning the video, this is based on two distinct point cloud reconstructions of the theatre. A first one has been created through photogrammetry, the other one from scratch; this latter one has also been used to build the three-dimensional model to estimate the modal properties of the venue. From an artistic point of view, since the island never existed it may not be represented directly, therefore its mysterious presence has been evoked by some kind of water- and animal-like sounds and by the colour palette of the video.

Although Sandy Island is site-specific, as long as the immersive character of the work is preserved, it may be arranged on a smaller scale on whatever theatrical stage by properly adapting its content and following the steps illustrated on Fig. 4.

Fig.4 A flowchart illustrating all the steps for the staging of the work on a different environment. TroubleCube, which is the name of the preset manager tool I am going to present in the next chapters, has been used in a two steps process. The first one offline, in order to collect the values of the initial presets from the device TroubleCube is applied to. The second one in real-time to navigate the three-dimensional virtual space and to send the presets’ values to the aggregate instruments.

Motivation

The content of the work is based on a continuous morphing between several states the instruments responsible for the audio and video synthesis can assume. Let’s focus on the audio first.

In order to generate the audio, a collection of plug-ins have been implemented by the author as a bank of resonant band-pass filters.8 The resonant filter has been chosen over table-based sine oscillators for its capability of better approximating a sinusoidal waveform, which is the starting point for the sound design process. Any mode of vibration set inside the filter bank is composed by a frequency, an amplitude and a decay rate or T609 value. This is derived from the modal analysis that has been run on batches of impulse responses (IRs) measured in the HfMT’s Forum and refined on the three-dimensional model of the theatre.

Fig.5 The 3D model of the HfMT’s Forum created from scratch that has been used for the modal analysis seen from above.

A small library of python scripts10 is responsible to convert those data, first into Faust code11 that implements a filter bank with specific constrained values, multichannel out and OSC support. This is then exported as a jucer file and compiled as an AU plug-in via Xcode.

Even before defining the chains of effects which these collection of plug-ins became part of, the first issue that arose was how to set the modal frequencies’ values without abrupt changes, so to smoothly morph from one set of values to the next one. All the acoustic measurements put together slightly differ from each other, therefore I could not just scroll between them, since the differences would have been barely audible. On the contrary, I would need a n-dimensional virtual space where to embed all the measurements and thus the synthesis parameters, that I could explore and navigate. What soon became evident was the necessity of a tool that implemented a new form of parameters’ management that could be used at different compositional levels:

  • To explore the values of the analysis, even by supporting interpolation-like features, so to create a virtual space that could automatically deliver values coherent with the acoustics’ one. This can be defined as the study of pre-compositional material in a constrained domain;

  • To evaluate entire audio chains, made of sound generators and effects combined together, all at once, in order to find those settings, which we will refer to as the presets, that exhibit the desired sonic characteristics. This is to define the sound palette of the work;

  • To define trajectories, in the sense of their speed and shape, that are used to link together the presets. By doing so we implicitly define the interpolation values of our instruments and thus we compose the timeline of the work;

These needs, inherently related to the management of huge sets of data to drive the synthesis processes, were the starting point to define an approach that is at once: agnostic of the engine it is applied to, can be used at low so as at a higher compositional level, and whose ultimate goal is the ability to effectively control entire sets of parameters without having to manually operate one parameter at a time. All this took the form of TroubleCube a preset manager open-source tool,12 that I am going to present in the next chapter.

Description

TroubleCube addresses simultaneously: parameter-based preset exploration, macro-control learning and preset morphing capabilities. Some of these characteristics can already be found in other existing tools that implement forms of synthesis space exploration (Yee-King et al., 2018), (Shier et al., 2020), to name a few. Nevertheless one common ingredient to these approaches is the fact that are based on the analysis of the data coming from audio descriptors, usually, but not limited to MFCCs, which are then used to infer the parameters of the synthesis that approximate a target sound. Although this class of models is able to infer the parameters with a good approximation (Esling et al., 2020), the necessity of a target sound makes these formulations not suitable for the scenario I previously introduced. Therefore, instead of focusing on the perceptual space of the sounds to be approximated, I focused directly on the parameter space.13 In other words the problem of controlling a synthesis engine is reduced to a mapping task in a space of n-dimensions, in a two-steps process. First through unsupervised dimensionality reduction techniques the parameters’ values are represented as points in a n-dimensional virtual space. Then, running a regression on this parameters’ collection, we build a map that allows to smoothly morph between all the vectors and thus between any point inside the map.

Dimensionality reduction routine

We decided to reduce the dimensionality of the normalised vectors representing the parameters of the synthesis to three so to be able to represent each vector as a point in a three-dimensional virtual space. Indeed from an user experience perspective, such a space can be intuitively navigated. By doing so we opted for a heuristic approach deliberately discarding the non-linear relationships between the parameters and the resulting sound, in favour of a linear dimensionality reduction technique. Hence, in the system developed to control Sandy Island, the focus was to achieve an agnostic macro-control learning with morphing capabilities and not to unveil the true relation between auditory and parameter space. The main issue I had to face was the choice of the “right” technique for dimensionality reduction. After experimenting with linear reduction methods e.g.: independent component analysis (ICA), singular value decomposition (SVD) and a set of non-linear methods (manifold learning), eventually we chose the principal component analysis (PCA) driven by empirical considerations:

  • A good approximation level of the vectors retrieved from the reduced space if compared with the original values;

  • The degree of regularity of the reduced space: a regular space is easier to deal with in terms of user-interaction and trajectories’ management;

  • The parameters’ quality from an artistic point of view;

This approach could be defined as hybrid because the very first step to use the system was to manually create few presets, either audio or video. Typically we started by creating eight presets14 from scratch with a contrasting character; in the case of audio these result in eight vectors of 120 values each. These are normalised and reduced, thus each vector (input) is associated with three values corresponding to the reduction of the vector and representing a system having xyz coordinates (output).

Regression routine

The result is stored as a json file composed of input-output values pairs which represent the features-targets data. Now the data can be used to feed a multi-layer perceptron network (MLP) and fit a regression model. For this task the RapidMax library15 has been chosen as the fastest solution to make the MLP neural network available inside Max. This allows to easily run such a network inside Live as a Max4Live plug-in. RapidMax is the RapidLib C++ machine learning library16 wrapped into a Max object, it supports features classification (kNN), regression (MLP) and series classification using dynamic time warping. In particular, the rapid.regression Max's object implements a configurable and light-weight MLP neural network that can be trained using backpropagation, random weights initialisation and the sigmoid activation function. The MLP network has been trained for 500 epochs using an input layer, two hidden layers and an output layer with three nodes for each layer with a learning rate of 0.3 and a momentum of 0.2.

Fig.6 The GUI that allows to explore and navigate between all the presets. The small white torus almost in the middle represents the current reading point of the space. Here the trajectories are not yet defined since the points are not connected.

Once the training is complete the virtual space can be navigated through its GUI either using a mouse or a different device. At any given point selected inside the virtual space a new batch of interpolated parameters will be sent to all the audio chains created inside Live via OSC messages. Now it is possible to navigate the virtual space to find new combinations of parameters which deliver the desirable sonic or visual outcome. Each new preset that is found will be rendered as a named colour-coded point inside the space.

Towards an adaptive graphical environment for composition

It is author’s opinion that this approach offers the opportunity to re-think the activity of composing in an electronic music domain in a new fashion by providing a higher level GUI. In fact, now all the points inside the space can be connected together through trajectories. The use of trajectories for the management of the sound diffusion parameters it is certainly not new in the electronic and electroacoustic music domains. The novelty here is that the trajectories become a mean to organise the synthesis parameters tout court, either audio or video, while the sound diffusion parameters are completely independent and mainly linked to the orientation of the point clouds rendered on the video wall. This kind of approach has been used at different levels independent from each other: to set the values of the plug-ins reproducing the modal frequencies of the environment, to set the parameters’ values of a series of effects chains distributed on several tracks of a Live set, to set the nodes’ values of a TouchDesigner patch responsible for the generation of the video.

Looking closer on how the trajectories work, we could imagine the route in the virtual space between p1 e p2 for example, as it were a straight line to be followed. For Sandy Island I started by predefining the routes we wanted to take inside the virtual space of TroubleCube’s GUI to connect all the points, and thus the presets, 82 in total. This implies connecting all the points using straight lines, e.g.: from p33 to p10 to p47 and so on. The decision to connect a specific preset to another one is dictated only by the sonic and visual character of the interpolation that is the result of travelling through that specific route. Indeed, if from p33 we would go to p34 instead to p10 for example, we would get a totally different sonic and visual outcome.

We then used several audio descriptors to analyse the sound generated from the people visiting the work and captured through two microphones hanged on the ceiling. The fine tuning of the descriptors is crucial because they have to be as less sensible as possible to the sound produced by the system itself and more sensible to the kind of sounds the audience might produce. For this task the microphones’ signals are analysed by means of pipo.ircamdescriptor (Schnell et al., 2017) in their spectral centroid, perceptual tristimulus, harmonic energy, noisiness, signal zero crossing rate and total energy components. The resulting data have been standardised (Smith & Garnett, 2012), filtered, reduced in dimensionality and finally mapped to the trajectories in the three-dimensional virtual space of TroubleCube. These signals are then used to distort the straight shapes of the predefined trajectories in a cumulative fashion. First each signal coming from the descriptors is dynamically standardised since its range is unknown, these values according to the sign of standardisation are added or subtracted to the coordinate values that represent the predefined trajectory at any given point. If the route between p1 and p2 has already been altered, when that route will be covered again, the distortion created by the signals of the descriptors will start from the altered route and not from the straight line defined at the very beginning. At the end of the process each trajectory has become unrecognisable allowing to explore, each time is passed through, new zones of the parameters’ space and consequently new changing synthesis flavours which maintain a certain degree of similarity with the original straight routes.

The adaptive character of the work suggests a relation between the qualities of the presets and the acoustic components of the surrounding environment that is not transparent but opaque. The artistic goal is to create a synergy rather than an interaction, where the influence on the audio and the visual outcome cannot be traced back to specific actions performed by the audience. While it must be clear to the visitor that the work relies on some interactive elements, the outcome corresponding to a gesture generated by the visitor must not be predictable. This is a deliberate choice that represents a metaphor linked to the botanical world. Just as when a seed is planted in a garden, we can observe it grow. What the seed will become is defined by its own nature, but at the same time the way in which the surrounding environmental characteristics influence the transformation of the seed into a plant and its growth is neither clearly unambiguous nor fully quantifiable. This setting is also dictated by practical reasons; conversely there would be no room for any kind of external interaction and the distortion of the trajectories would only depend on the sound generated by the system itself, creating an uncontrolled feedback loop. Finally, it is the author's intention to overcome a transparent type of interaction in order not to feed a context based on an action-reaction game that would soon reveal its mechanism.

Other applications

The GUI of the system and the main controller of the installation have been developed as a Max patch. The tool for the parameters’ space exploration currently has been developed in two flavours. One as a Max patch embedded inside the main control patch of the installation, the other one integrated inside Live as a Max4Live plug-in. Both versions allows to perform a direct exploration of presets based on their parametric similarity. Given these characteristics I can imagine the prototype as a valid method to overcome the classic scenario where a digital instrument is programmed one parameter at once with all that this implies (Kreković, 2019). Such an approach would have the benefits of smoothen a steep learning curve allowing the user to focus on the expressive capacities of an instrument by decreasing the time needed for its technical understanding. Furthermore due to the three-dimensional interface it could be naturally integrated in virtual and augmented reality scenarios.

Conclusions and future work

Sandy Island by its own nature resembles the architecture of a complex system containing in a nutshell some of its characteristics such as non-linearity and synergetic components. One of the goals of the future work in the context of this research project is to take over more characteristics proper of complex systems such as emergence for example, overcoming the hybrid approach I previously sketched, which still uses elements that have been over-imposed onto the system. In this context the complex systems are not intended as experimental artistic devices which are able to exhibit a relation with the surrounding environment.17 Complex systems are thought by the author as something that can be enjoyed by the audience as an element of a broader public space to which a visitor is willing to expose him or herself to get something back that goes beyond the “standard” concert or exhibition experience. During the staging of Sandy Island many people decided to spend few hours inside the work either contemplating it, talking together, reading a book or even trying to establish a dialogue with the system by playing an instrument.

Concerning the tool I introduced in the previous chapters, TroubleCube has been proven to be effective to control both audio and video parameters in a faster and more intuitive way, allowing to quickly asses the expressive capabilities of aggregate instruments in complex multimedia scenarios and reducing the time necessary to properly shape entire chains of effects according with a desired palette of outcomes. Nevertheless, the tool for the synthesis’ space exploration still needs to be further researched. At this stage, said tool is not able to generalise, while the definition of the virtual space of interpolation is still heavily dependent on the values of the starting presets. Another critical aspect is represented by the perceptual value of the virtual space axes. Already planned research in this direction will be aimed at leveraging the trade-off between perceptual significance of the axes and the accuracy of the values sampled from the space. On top of that the management of trajectories is still tricky and has to be further optimised.

Bibliography

Anatrini, Alessandro (2022). “The awareness of the tools in the neural media praxis”, in: Thomas Görne, Georg Hajdu, Benjamin Helmer, Jacob Richter (eds.), KISS - Kinetics in Sound and Space, 22-45.

Esling, Philippe; Masuda, Naotake; Bardet, Adrien; Despres, Romeo; Chemla Romeu Santos, Axel (2020). “Flow Synthesizer: Universal audio synthesizer control with normalizing flows”, Applied Sciences 10(1):302-317.

Kreković, Gordan (2019). “Insights in habits and attitudes regarding programming sound synthesisers: a quantitative study”, in: Isabel Barbancho, Lorenzo J. Tardón, Alberto Peinado, Ana M. Barbancho (eds.), Proceedings of the 16th Sound and Music Computing Conference (SMC2019), 316-323.

Pardo, Bryan; Cartwright, Mark; Seetharaman, Prem; Bongjun Kim (2019). “Learning to build natural audio production interfaces”, Arts 8(3):110-130.

Schnell, Norbert; Schwarz, Diemo; Larralde, Joseph; Borghesi Riccardo (2017). “Pipo a plugin interface for afferent data stream processing modules”, in: Zhiyao Duan (ed.), Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR2017), 361-367.

Shier, Jordie; Tzanetakis, George; McNally, Kirk (2020). “Spiegelib: An
automatic synthesizer programming library”, Audio Engineering Society Convention 148: Paper 10377.

Smith, Benjamin D.; Garnett, Guy E. (2012). “Unsupervised Play: Machine Learning Toolkit for Max”, in: Georg Essl, R. Brent Gillespie, Michael Gurevich, Sile O'Modhrain (eds.), Proceedings of the 12th New Interfaces for Musical Expression Conference (NIME2012), 40-43.

Yee-King, Matthew John; Fedden Leon; D’Inverno Mark (2018). “Automatic programming of vst sound synthesizers using deep networks and other techniques”, IEEE Transactions on Emerging Topics in Computational Intelligence 2(2): 150-159.

Zavalishin, Vadim (2020). “The art of VA filter design”. https://www.native-instruments.com/fileadmin/ni_media/downloads/pdf/VAFilterDesign_2.1.2.pdf (accessed March 15, 2023).

Comments
0
comment
No comments here
Why not start the discussion?