Sleepy Mech

Spencer Harris

Advisor: Sarah Ibrahim

Sleepy Mech is an all-in-one portable system for machine learning and audio. The system assumes no knowledge of computers, instruments, or advanced algorithms, and instead invites the user to explore cutting-edge models through an accessible interface.

Project Website Presentation
A friendly-looking white robot with a screen for a face, showing a smile

Abstract

This project is a speculative prototype of an audio device for machine learning style transfer models that features a variety of pre-trained models as well as an “unloop” feature that creates new and unexpected audio patterns from user data. The only input required in the system is from a microphone. Users can swap between a variety of active machine learning models by changing the “heads” on the device, and can take it anywhere they choose.

Classmate Kay Wasil taking a look at the cardboard prototype of the project!

Technical Details

The project is based off the RAVE autoencoder model framework inside Max/MSP. Models were developed based on publicly-scraped data, open-source datasets, and some closed-source datasets, which will not be released publicly. The framework also interacts with unloop, a Max/MSP-powered interface for a Gradio instance of VampNet, which is a a masked acoustic token model for audio. An RFID reader, some potentiometers, rotary encoders, and buttons are housed within an MJF nylon interface, and can be used to interact with various parts of the software. Most notably, RFID tags are used to swap between active machine learning models.

A gallery of illustrations arranged in a 3x3 grid. Descriptions under the models read as follows (left to right, top to bottom): Organ, Nasa, Percussion, Darbouka, 808, Bird, Text-To-Speech, Darbouka (Outtake), Guitar

Research/Context

In 2021, the Acids-Ircam research team released nn_tilde, a Max/MSP-powered external that supported real-time inference with their RAVE machine learning audio models. The compact size and quick inference time of these models meant that they could be used inside compact systems, and even single-board computers. This new paradigm of audio technology has tremendous implications for the future of music production.

In the years since the public release of the RAVE framework, technology for Machine Learning in audio has proliferated, as well as ML more generally; however, resources to understand these models and interact with them in an intuitive way remain limited at the time of this project. This project hopes to be a step in the opposite direction by providing users with an all-in-one, intuitive interface to interact with cutting-edge machine learning models in a tangible way.

An exploded view of the model

Further Reading

Chief Fabricator - Alina Liu
Package Design - Anvay Kantak
Illustrator - Proud Aiemruksa
Models - Acids/Ircam Team, Intelligent Instruments Lab
RAVE, nn~ - Acids/Ircam Team
unloop/VampNet - hugo flores garcia

And of course a huge thank you to my friends, family, and peers, who tirelessly supported the development of this project in so many tangible and intangible ways!

  • another render of the robot, this time with a frog face on the screen.
  • Classmate Kay Wasil taking a look at the cardboard prototype of the project!
  • Classmate Kay Wasil taking a look at the cardboard prototype of the project!
  • A gallery of illustrations arranged in a 3x3 grid. Descriptions under the models read as follows (left to right, top to bottom): Organ, Nasa, Percussion, Darbouka, 808, Bird, Text-To-Speech, Darbouka (Outtake), Guitar
  • A gallery of illustrations arranged in a 3x3 grid. Descriptions under the models read as follows (left to right, top to bottom): Organ, Nasa, Percussion, Darbouka, 808, Bird, Text-To-Speech, Darbouka (Outtake), Guitar
  • An exploded view of the model
  • An exploded view of the model
  • another render of the robot, this time with a frog face on the screen.
  • another render of the robot, this time with a frog face on the screen.
  • A friendly-looking white robot with a screen for a face, showing a smile
  • A friendly-looking white robot with a screen for a face, showing a smile