[Seminar] MLDS Unit Seminar 2024-1 by Dr. Gordon Wichern

Friday April 26th, 2024 11:00 AM to 12:00 PM

Seminar Room L5D23, Lab5

Seminar

Description

Dr. Gordon Wichern, Senior Principal Research Scientist, MERL (Mitsubishi Electric Research Laboratories)

Title: Towards explaining audio generative models

Abstract: One of the more fascinating debates in the current hype cycle over large language models (LLMs) is whether LLMs merely copy and regurgitate their training data, or if they learn an underlying world model of human language. Given advancements in audio diffusion models and generative audio transformers, it is interesting to ask what these models know about audio. In the first part of this talk, I will share our recent progress attempting to detect and quantify training data memorization in a large text-to-audio diffusion model. I will then shift to our analysis of a large generative music transformer, where we use simple linear classifier probes to understand what this model knows about music. Then, I will discuss how we can use these probes to steer the generative model in a desired direction without retraining, enabling more fine-grained interpretable controls compared to only text prompts. Time permitting, I will also give an overview of the MERL Speech and Audio Team’s work on various other audio and multimodal topics.

Bio: Gordon Wichern is a Senior Principal Research Scientist at Mitsubishi Electric Research Laboratories (MERL) in Cambridge, Massachusetts. He received his B.Sc. and M.Sc. degrees from Colorado State University and his Ph.D. from Arizona State University. Prior to joining MERL, he was a member of the research team at iZotope, where he focused on applying novel signal processing and machine learning techniques to music and post-production software, and before that a member of the Technical Staff at MIT Lincoln Laboratory. He is the Chair of the AES Technical Committee on Machine Learning and Artificial Intelligence (TC-MLAI), and a member of the IEEE Audio and Acoustic Signal Processing Technical Committee (AASP-TC). His research interests span the audio signal processing and machine learning fields, with a recent focus on source separation and sound event detection.

Add Event to My Calendar

Subscribe to the OIST Calendar

See OIST events in your calendar app