Anna-Kaisa Kaila ([email protected]), André Holzapfel ([email protected]), Bob L. T. Sturm ([email protected])
KTH Royal Institute of Technology
Stockholm, Sweden
How valuable really is the application of artificial intelligence (AI) to music? Other than publications and entertaining media content, what good does it bring? Does the world need efficient algorithms for generating “music” of questionable quality that nobody will ever listen to? Are there any tangible benefits to building systems that (almost) no one will use?
This brief paper examines some of these questions by considering a specific line of research applying AI to music. In answer to the questions posed above, we discuss how considerations of purpose and value should always lie at the heart of our research endeavors. Furthermore, we argue that programming computers to generate music can be a way to engage with deeper questions about methodologies of working with the datafication of cultural expressions.
This provocation has been mainly written by and from the perspective of the first and second authors. At Alt-AIMC, we plan to present this provocation as an agonistic dialogue that gives significantly more voice to the third author, whose views may differ from what is expressed here.
Folk-rnn (Sturm et al. 2016) applies recurrent neural networks (RNN) to model symbolic sequences of heterophonic music, typically Western traditional music. The first iterations involved training models with a dataset of over 23,000 transcriptions of Irish traditional dance music (ITM) downloaded from the crowd-sourced repository https://thesession.org. Initially, folk-rnn began as a humorous exercise in 2015 playing with the capabilities of a deep neural network in a novel domain.1
The choice of using transcriptions of ITM as training material by Sturm et al. (2016) arose from personal affection for and passing familiarity with it as a style of music, but it was advantageous for several reasons. A sizable amount of such transcriptions is available for training data-hungry models. The structure of the dance music is consistent, and the length of transcription sequences should be readily achievable with contemporary machine learning models. Furthermore, issues surrounding copyright are less problematic than for popular music since much of ITM lies in the public domain — or at least in a commercial sphere that is less litigious. ITM is also a living practice with many practitioners around the world to engage with, and from whom expertise and feedback can be drawn. Finally, the values inherent to ITM are such that frictions with technology such as AI are much more pronounced than they are with popular music, which leads to interesting avenues of investigation.
In short, Sturm et al. (2016) embarked on creating folk-rnn not because they identified a need for an unlimited supply of new music transcriptions imitating those of ITM, but rather because there was an opportunity to do so, and curiosity arose to see whether a state-of-the-art language model could succeed in modeling transcriptions from this domain. Considering this, folk-rnn originated primarily as an exercise in machine learning and a parlor trick, rather than as an instrument of culturally meaningful creative expression.
The primary focus of applying machine learning to transcriptions of ITM via folk-rnn was therefore not on the value it brings to music practitioners. The contributors to thesession.org were never queried on this specific use of their hand-entered data. Moreover, folk-rnn does not provide anything the users of thesession.org have asked for, and it does not readily serve the goals of the users of thesession.org in preserving, providing, and furthering ITM (see user comments cited in Sturm and Ben-Tal 2021, 447–449). Folk-rnn is, in essence, a textbook example of a “solution in search of a problem”. If a culturally meaningful purpose for folk-rnn exists, it comes as an afterthought.
While the usefulness and cultural value of folk-rnn can be problematized, a more fundamental question remains: can folk-rnn cause harm? We would argue for the positive. First, as the model draws from a pool of shared cultural expressions, even its superficially innocent exploration and non-commercial repurposing of ITM engages in a form of data colonialism (Couldry and Mejias 2019). Data colonialism refers to “an emerging order for appropriating and extracting social resources for profit through data” (Couldry and Mejias 2019, xix). This negative effect, whether caused by the pursuit of financial profit or other types of cultural capital, is aggravated by the legacy of social and political colonialism that Ireland and Irish immigrants have been subjected to throughout history. Secondly, unwarranted use of ITM can create an impression that a living music tradition can be reduced to a simple set of data points. Such discourse can trivialize the richness of ITM and the efforts of communities engaged in preserving and promoting it. Furthermore, the repetitive stream of simulacra of ITM transcriptions from folk-rnn (Baudrillard 1994) might blend with the “canon”, thus endangering a core value of ITM practice: authenticity (Huang & Sturm 2021). In short, even though its origins may be argued as innocent, folk-rnn represents a harmful misuse of ITM.
Through this misuse, folk-rnn seems to incur a debt to the cultural commons that manifests in two ways. Firstly, there is frustration expressed in a few of the comments on thesession.org towards folk-rnn. The other form of manifestation is the opposite: a deafening silence. In fact, the lack of wider material interest in and engagement with the community website accompanying the folkrnn.org application (themachinefolksession.org) may be the strongest indication that the service has not found a tangible, meaningful purpose in the community to which it partially owes its existence. One could even speculate that the superficially passive act of non-use can be in itself a sign of active resistance: an effort to silence folk-rnn to death.
How can a research team seek to rectify and pay back such a cultural debt to the musical community? At the minimum, we should expect the privileged Western Academia to acknowledge and thank the data source, and label the output works appropriately as AI-generated. Unfortunately, citations pay not much more than lip service to ITM which — unlike the researchers themselves — does not materially, financially or career-wise benefit from them. Therefore, simply announcing the data source is not enough. In fact, doing so can cause more harm as it attracts more possibly naïve use of the original data source.
Could one argue that generative AI models do in fact not detract but add to the public domain? By providing new copyright-free works in massive quantities, are the cultural commons not rather expanded and enriched than impoverished? Since there is currently very little research on the emerging scale, aesthetics and reception of AI-music and the services generating it, no definite conclusions can yet be drawn for or against this argument. The rejection or disinterest expressed by thesession.org users, however, seems to indicate that not all musical communities equate the mere availability of more works with value-added.
Rather, the opposite may be the case — drowning the cultural space in unrequited content put forth in monstrous quantities may be carcinogenic (Attali 1985, Mersch 2020). We already see the first signs of this in the visual domain, as the massive proliferation of AI-generated images immediately resulting from the so-called democratization of digital image creation, is driving away users from previously popular digital art platforms, such as ArtStation and DeviantArt (Edwards 2022). There is no reason to think why similar developments could not take place in the music domain, perhaps already in the near future.
Another interesting idea put forth by Huang et al. (2023) is to provide financial support for the artistic community in return for the privilege of accessing their cultural heritage materials. This could, for instance, take the form of sharing some of the costs of web hosting of online music repositories such as thesession.org, or providing musicians from the community paid performance opportunities in research events. While such efforts can be applaudable as a form of wider collaboration and engagement in community work, this is likely not a sustainable strategy for using research funding, and the long-term benefits of such singular efforts of feudalistic charity for the larger ITM community are negligible.
The efforts outlined above can be steps in the right direction, but there are two fundamentally constructive ways in which AI research in the music domain can strive to be more broadly valuable.
First, close attention should be paid to the musical and cultural purposes that the application developed provides for artistic communities. Is there an actual user or a user group, and an actual use case for the application? Does the application genuinely support and enable practitioners to do the work they wish to do, or avoid tasks they wish not to spend time on, without undercutting the right of being compensated financially for one’s work (Drott 2021)? For this purpose, a good start would be to analyze the stakeholders of the project, both current and projected (Kaila et al. 2023) and map its risks (Holzapfel et al 2018), but in many cases, it is necessary to really get in contact with the domain experts: the music communities. It is of crucial importance that such work is structured as non-hierarchical, genuinely collaborative efforts that not only involve summary evaluations of already completed research but engage the music communities in co-designing right from the start of the initiative. Similarly, such critical explorations may well necessitate the research team to reach out of their own comfort zones and establish interdisciplinary collaborations in which engineering expertise does not necessarily take the highest priority (Clarke 2022, Agre 1997).
What about the curiosity-driven research inquiries that fall into the realm of basic research, in which the main objective is the pursuit of knowledge for its own sake rather than the development a specific practical application? Even then we can ask what the knowledge created in the research process is, and who may this knowledge serve, either in the present or in some reasonably foreseeable future. Consequently, in the course of developing, evaluating, testing and eventually abandoning research initiatives and music-AI applications, as well as in training future scholars to continue that work, we must keep asking ourselves critical, at times uncomfortable questions, and most importantly, be ready to hear the honest answers.
Considerations of research value and purpose are critical to working in ethical ways with music AI. They should be at the center of the research endeavors, not an afterthought for embellishing the ethics statement in the final project report. Moreover, working together with music communities at all relevant stages can make the research more broadly constructive. In essence, our research efforts in music AI should critically ask how we can be of service to peoples and their music and other cultural expressions — not the other way around.
Agre, P. E. (1997). Lessons learned in trying to reform AI. In: G. Bowker, S. L. Star and L. Gasser and W. Turner (eds.), Social science, technical systems, and cooperative work: Beyond the Great Divide. New York: Psychology Press.
Attali, J. (1985). Noise: The Political Economy of Music. Minneapolis: University of Minnesota Press.
Baudrillard, J. (1994). Simulacra and Simulation. Ann Arbor, MI: University of Michigan Press.
Clarke, E. H. (2022). A Postcolonial MIR? Resonance: The Journal of Sound and Culture, 3(4): 412–432.
Couldry, N. & Mejias, U. (2019) The Costs of Connection: How Data Is Colonizing Human Life and Appropriating It for Capitalism, Redwood City: Stanford University Press.
Drott, E. (2021). Copyright, compensation, and commons in the music AI industry. Creative Industries Journal 14(2): 190–207.
Edwards, B. (2022, December 15). Artists stage mass protest against AI-generated artwork on ArtStation. Ars Technica. https://arstechnica.com/information-technology/2022/12/artstation-artists-stage-mass-protest-against-ai-generated-artwork/
Holzapfel, A., Sturm, B. L. T., and Coeckelbergh, M. (2018). Ethical dimensions of music information retrieval technology. Transactions of the International Society for Music Information Retrieval, 1(1): 44–55.
Huang, R. S., Holzapfel, A., Sturm, B. L. T., and Kaila, A. (2023). Beyond Diverse Datasets: Responsible MIR, Interdisciplinarity, and the Fractured Worlds of Music, Transactions of the International Society for Music Information Retrieval 6(1): 43–59.
Huang, R. S., & Sturm, B. L. T. (2021). Reframing “Aura”: Authenticity in the Application of Ai to Irish Traditional Music. Proceedings of the 2nd Conference on AI Music Creativity (AIMC).
Kaila, A., Jääskeläinen, P., Holzapfel, A. (2023). Ethically Aligned Stakeholder Elicitation (EASE): Case Study in Music-AI. Proceedings of the International Conference on New Interfaces for Musical Expression.
Mersch, D. (2020). (Un)creative Artificial Intelligence: A Critique of “Artificial Art.” ResearchGate, 2020, DOI: 10.13140/RG.2.2.20353.07529.
Sturm, B. L. T., Santos, J. F., Ben-Tal, O. and Korshunova, I. (2016). Music transcription modelling and composition using deep learning. Paper presented at Computer Simulation of Musical Creativity, Huddersfield, UK, June 17–19.
Sturm, B. L. T. and Ben-Tal, O. (2021). Folk the Algorithms: (Mis)Applying Artificial Intelligence to Folk Music. In: E. R. Miranda (ed.), Handbook of Artificial Intelligence for Music. Springer International Publishing, Cham, 423–454.