Sleepy Mech – ITP Thesis Archive 2024

Abstract

This project is a speculative prototype of an audio device for machine learning style transfer models that features a variety of pre-trained models as well as an “unloop” feature that creates new and unexpected audio patterns from user data. The only input required in the system is from a microphone. Users can swap between a variety of active machine learning models by changing the “heads” on the device, and can take it anywhere they choose.

Classmate Kay Wasil taking a look at the cardboard prototype of the project!

Technical Details

The project is based off the RAVE autoencoder model framework inside Max/MSP. Models were developed based on publicly-scraped data, open-source datasets, and some closed-source datasets, which will not be released publicly. The framework also interacts with unloop, a Max/MSP-powered interface for a Gradio instance of VampNet, which is a a masked acoustic token model for audio. An RFID reader, some potentiometers, rotary encoders, and buttons are housed within an MJF nylon interface, and can be used to interact with various parts of the software. Most notably, RFID tags are used to swap between active machine learning models.

A gallery of illustrations arranged in a 3x3 grid. Descriptions under the models read as follows (left to right, top to bottom): Organ, Nasa, Percussion, Darbouka, 808, Bird, Text-To-Speech, Darbouka (Outtake), Guitar

Research/Context

In 2021, the Acids-Ircam research team released nn_tilde, a Max/MSP-powered external that supported real-time inference with their RAVE machine learning audio models. The compact size and quick inference time of these models meant that they could be used inside compact systems, and even single-board computers. This new paradigm of audio technology has tremendous implications for the future of music production.

In the years since the public release of the RAVE framework, technology for Machine Learning in audio has proliferated, as well as ML more generally; however, resources to understand these models and interact with them in an intuitive way remain limited at the time of this project. This project hopes to be a step in the opposite direction by providing users with an all-in-one, intuitive interface to interact with cutting-edge machine learning models in a tangible way.

Abstract

Technical Details

Research/Context

Further Reading