Tomoki Okuda
Graduate School of Design
Kyushu University
Kazuhiro Jo
Faculty of Design
Kyushu University
This paper explores a form of live performance where humans and AI mutually influence each other. As AI research progresses at a fast pace, its use in the creative domain is under exploration. The challenge lies in avoiding the mere reproduction of past works when utilizing AI in creative activities. For creative expression, merely outputting a complete work using AI isn't sufficient. Instead, reciprocal interaction between humans and AI is needed. In the context of live coding performances using TidalCycles, we explore the potential of incorporating GPT-3 and ChatGPT.
As the social implementation of artificial intelligence progresses, the use of AI is being explored in the creative field. One of the key challenges in creative activities involving AI is how to transcend the mere reproduction of past works. An example of using AI to create unconventional musical expressions is the AI DJ Project by Nao Tokui [1]. This project adopts the "Back to Back" style, where two DJs take turns playing tracks. The unpredictability introduced by the AI DJ injects a palpable sense of tension into the performances, effectively augmenting human creativity. AI's capabilities are currently limited to pattern recognition based on training data and generating work based on these patterns. Conversely, human creativity can respond to new contexts and situations, generate new ideas from scratch, and provide unique perspectives through emotions and experiences, which are still difficult to reproduce with current AI technology. Thus, expressing creativity using AI requires more than outputting the work in its entirety to an AI; it necessitates a reciprocal interaction between humans and AI.
In this paper, we explore a novel form of a live performance by combining the widely used language generation models of the GPT series with text-based live coding performances for music. By doing so, we aim to redefine the relationship between humans and AI in creative activities and seek new possibilities in live performances.
Live coding is a performing art in which improvised coding, is done while the program is running. This paper focuses on live coding using TidalCycles [2], a library implemented in the Haskell programming language. TidalCycles generates music loops based on executed code, unlike general programming languages that immediately terminate. The generated music patterns continue repeating, allowing performers to modify and create new patterns.
AI natural language processing technology has dramatically improved in recent years, with the Transformer-based[3] language generation model GPT-3[4], introduced by OpenAI in 2020, being a groundbreaking idea that has been widely adopted. This language generation model was built by training on vast amounts of text data and is capable of producing natural sentences as if they were written by a human.
ChatGPT [5] is an AI chatbot released by OpenAI in 2022. Built on the GPT-3.5 series of language generation models that had completed training in early 2022, it uses both supervised and reinforcement learning methods. More specifically, it uses a method called Reinforcement Learning from Human Feedback (RLHF)[6,7]. This paper utilizes the GPT-4 model [8], which was released on June 13, 2023, within the framework of ChatGPT.
We prepared a small dataset with TidalCycles code and used it to train a GPT-3 model specifically designed to generate TidalCycles code. This model generated source code from the Japanese text, and we checked the accuracy of the code produced. We also tested GPT-3's functionality in an actual performance setting.
Fine-tuning is a process that trains an already trained model with user-prepared training data, allowing the model to adapt to a specific task. Though GPT-3 can be used for a variety of language generation tasks, fine-tuning enables it to become more task-specific and generate language with higher accuracy.
For the training data, we prepared 998 pairs of source codes describing drum and synthesizer performances along with Japanese text explaining them. We used Japanese, the native language so that performers could understand the language nuances more deeply and design effective prompts. The performers themselves matched the Japanese text to the source code, allowing for an AI model specifically trained for their use. Out of the four GPT-3 models, we fine-tuned the most capable one, Davinci.
For the fine-tuned GPT-3, the following code was generated from the text not included in the training dataset, 'cut up and reconstruct breakbeats'. The output was also not included in the training dataset, indicating that the model wasn't over-trained.
Figure 1: Code generation with fine-tuned GPT-3.
The fine-tuned GPT-3 provided multiple advantages, enabling efficient output of numerous codes while including detailed sound creation that is often overlooked in live coding due to time constraints. Additionally, it produced unexpected outputs such as unique rhythm patterns and effect values. However, the performers themselves wrote many of the codes in the training datasets, resulting in the sounds produced by GPT-3 being similar to what they could achieve manually without AI assistance. Performers match Japanese prompts with TidalCycles codes in the dataset, allowing them to predict the generated codes. Consequently, GPT-3 functions more as a convenient tool than as a means of enhancing the performer's creativity. Thus, we hypothesized that including an AI with high otherness, specifically ChatGPT, is necessary to alter the actual output sound by AI and generate more creative outputs.
Live coding incorporating a fine-tuned GPT-3 follows this performance flow:
1. Create Japanese prompts.
2. Generate TidalCycles code from Japanese prompts using GPT-3.
3. Develop TidalCycles code by running it with small modifications.
The hypothesis was that by substituting steps 1 and 3, traditionally handled by the performer, with ChatGPT, we could increase the elements not under the performer's control, preventing the performance from becoming a prearranged event.
We experiment with ChatGPT to generate Japanese prompts that can be used for fine-tuning GPT- 3. To enable an interactive performance with the native language, we provide feedback to GPT-3 by feeding it the TidalCycles code it generates, and use that to generate new prompts.The prompts generated using this approach are shown in Figure 2. The instruction shown in Figure 2 is: "If you are a DJ and want to connect to the next song without any discomfort, describe what drums you would play as in 'cut up and rebuild the breakbeats'." As a result, a Japanese prompt was generated, which means "Emphasize the snare, create rhythm with kicks and hi-hats."
Figure 2: Japanese description of performances generated by ChatGPT.
The code shown in Figure 3 is the result of the instruction "Please make slight modifications to the code to make the performance even cooler." The code generation became more unpredictable compared to the fine-tuned GPT-3, influenced by the diverse coding styles of the individuals who trained ChatGPT and the subsequent fine-tuning. One issue observed was that while drum code modifications were somewhat unique, codes describing synthesizer performances were often generated in a disconnected fashion. Presumably, ChatGPT learns drum performance codes from a wide variety of rhythmic patterns, while synthesizer's performance has predominantly learned a certain set of rhythmic patterns and formats. Therefore, modifying a drum code would gradually change the rhythmic pattern, but in the case of the synthesizer, it could suddenly turn into a different phrase, which made the performance less cohesive.
Figure 3: Modification of code by ChatGPT.
We initially attempted a setup wherein the performance was conducted solely with ChatGPT and GPT-3. The sequence of the performance was as follows:
1. The performer considers Japanese prompts only at the start.
2. Japanese-to-TidalCycles code generation using fine-tuned GPT-3.
3. Modification of TidalCycles code using ChatGPT, repeatedly executing it multiple times.
4. Generation of Japanese prompts from TidalCycles code using ChatGPT.
5. Return to step 2 and repeat.
Figure 4 illustrates the code's evolution throughout the performance, showcasing the iterative code modifications made by ChatGPT. Initially, the code was a cut-up beat of breakbeats, after the code was modified, the drums became a beat of four kicks with a substantial amount of effects applied.
Next, the setup was modified to permit human performance ideas to intervene in the AI's performance sequence. The synthesizer is developed by the performer, while the drums are developed solely by the AI.
For the drum performance, the sequence is as follows:
1. Performers consider Japanese prompts.
2. Japanese-to-TidalCycles code generation using fine-tuned GPT-3.
3. The TidalCycles code is modified using ChatGPT, and the execution is repeated multiple times. This mirrors the synthesizer's rhythmic pattern described by the performer.
4. Generation of Japanese prompts from TidalCycles code using ChatGPT.
5. Return to step 2 and repeat.
This setup aims to integrate human creativity into the AI's creative process by allowing the performer's synthesizer performance to influence the code modifications made by ChatGPT in the drum performance.
In Figure 5, we present the code depicting the changes in the performance flow over three iterations, as well as the code describing the synthesizer's performance used for feedback. The initially generated code is a one-bar breakbeat sample, cut into eighth-note increments and reconstructed. The code, after being modified by ChatGPT, reflects the synthesizer's rhythmic pattern, resulting in a complex rhythm with seven notes within a sixteenth-note range. Thus, the performer's ideas provide feedback to the AI's performance, diversifying the progression of the performance, and generating complex rhythms that couldn't have been created in the AI-only performance sequence.
Figure 4: Code modified only by AI.
Figure 5: Code modified by AI with feedback on human performance.
Regarding the method of generating Japanese prompts for the next song using TidalCycles code, we found that instructing ChatGPT to "seamlessly connect to the next song" often resulted in conservative prompts. When allowing AI to generate language, it requires prior definitions and patterns, which leads to a need for logic first. On the other hand, humans sometimes come up with interesting phrases spontaneously during a conversation, and the logic behind why they are amusing is defined afterward. Incorporating techniques that encourage the chance of occurrence of amusing phrases in the generated results of ChatGPT, may lead to even more human-like and creative performances.
In our experiment, ChatGPT was invoked each time a code needed modification. As a result, new code were generated without an understanding of the previous performance's context. We anticipate that generating code based on the context of the prior performance could facilitate a more natural, cohesive performance.
The fine-tuned GPT-3, in this setup, was trained on a dataset reflecting the performer's sound makeup. Furthermore, the rhythm patterns described by the performer in real-time were incorporated into the code modifications made by ChatGPT. This approach allowed both the performer's tonal and rhythmic ideas to be integrated into the AI's creative process.
In this study, we experimented with integrating ChatGPT into a live coding performance setup that utilizes GPT-3 to explore the type of performance that could result from replacing human creative tasks with AI. The results indicated that incorporating the performer's sound and rhythmic ideas as feedback in the session with the GPT model led to performances that were more unpredictable and yet less disruptive.
In AI-based creative endeavors, there are various perspectives on how different artists perceive AI. Some view AI merely as a tool [9], while others consider it as an alien intelligence, completely distinct from humans, akin to extraterrestrials [10], and so forth. In the live coding performance using the personalized GPT-3 discussed in Chapter 2, AI was seen as 'another self' or a tool.
However, when the personalized AI (GPT-3) and the universal AI (ChatGPT), as discussed in Chapter 3, were combined, they seemed to blur the conventional AI perception boundaries between 'tools', 'self', and 'others'. In other words, how can we regard the existence of a session integrating a personalized AI (GPT-3) and a universal AI (ChatGPT)? Through exploring these questions, we hope to offer a fresh perspective on the relationship between AI capabilities and human creativity, potentially expanding the horizons of artistic expression.
I would like to thank Kazuhiro Jo for his valuable comments.
[1] Tokui, N. (2016). AI DJ Project. Nao Tokui. https://naotokui.net/ja/works/ai-djproject-2016-ja/
[2] McLean, A. (2014). Making programming languages to dance to: Live coding with tidal. In Proceedings of the 2nd ACM SIGPLAN International Workshop on Functional Art, Music, Modelling & Design (pp. 63-70).
[3] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. In NeurIPS.
[4] Brown, T. B., et al. (2020). Language Models are Few-Shot Learners. In Proceedings of NeurIPS (pp. 1877-1901).
[5] OpenAI. (2023). Introducing ChatGPT. OpenAI. https://openai.com/blog/chatgpt/
[6] Christiano, P. F., Leike, J., Brown, T., Martic, M., Legg, S., & Amodei, D. (2017). Deep reinforcement learning from human preferences. In Advances in Neural Information Processing Systems 30.
[7] Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C. L., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., et al. (2022). Training language models to follow instructions with human feedback. arXiv preprint arXiv:2203.02155.
[8] OpenAI. (2023). GPT-4. OpenAI. https://openai.com/research/gpt-4/
[9] RNZ News. (2023, June 30). Is AI a threat to artists or just another tool? Retrieved from https://www.rnz.co.nz/news/national/491879/is-ai-a-threat-to-artists-or-just-another-tool
[10] ZDNET. (2023, June 30). ChatGPT is more like an 'alien intelligence' than a human brain, says futurist. Retrieved from https://www.zdnet.com/article/chatgpt-is-more-like-an-alien-intelligence-than-a-human-brain-says-futurist