Recent years have seen unprecedented advancements in text-to-image / text-to-video AI models, sparking widespread attention, discussion, and mainstream adoption of these innovative co-creative tools. This has led to a mix of reactions, ranging from excitement and curiosity to concern, anger, and even offense. Alongside this, the growth of open-source models is democratizing access to these AI tools, extending their use beyond experts, tech giants, and professional technologists.
In this 14-week course, we will go over the landscape of text-to-image / text-to-video AIs and dive deep into some of the most well known ones (such as Stable Diffusion, Flux, CogVideoX, Hunyuan, etc.), to see what potential they have in terms of exploring new modes of content creation and helping us re-examine our language pattern. This will be a practice + technique course comprised of three modules: Text-to-Image AIs and Tools, Model Customization, and Text-to-Video AIs. In each module, there will be a hybrid of practice + technique sessions that focus on different topics such as building good prompting practices, image synthesizing, using Python to train models for customized visuals, building workflows with ComfyUI, and creating animations from text. We’ll also discuss how such tools could intervene in the workflows of artists and technologists, what they can provide for researchers, and what are the caveats and things we should look out for when we’re creating with these AIs.
Pre-requisites: Introduction to Computational Media (ICM) or the equivalent.