CVPR 2023 Tutorial:

Denoising Diffusion Models:
A Generative Learning Big Bang

Dates: Sunday June 18, 9:00 am - 12:30 pm (PT).
Location: West 202-204.
Format: Hybrid (registration needed).
Virtual link location: [link (registration needed)]
Recording: See blow.

Links to tutorial slides: [Part 1: Fundamentals] [Part 2: Images] [Part 3: Others]

Recording is now available [here].

Overview

Score-based denoising diffusion models (diffusion models) have been successfully used in various applications such as text-to-image generation, natural language generation, audio synthesis, motion generation, and time series modeling. The rate of progress on diffusion models is astonishing. In the year 2022 alone, diffusion models have been applied to many large-scale text-to-image foundation models, such as DALL-E 2, Imagen, Stable Diffusion and eDiff-I. These developments have also driven novel computer vision applications, such as solving inverse problems, semantic image editing, few-shot textual inversion, prompt-to-prompt editing, and lifting 2d models for 3d generation. Diffusion models have been widely adopted in various computer vision applications and are becoming a dominating class of generative models. This popularity is also reflected in the diffusion models tutorial in CVPR 2022, which has accumulated nearly 60,000 views on YouTube over 8 months.

Despite that, there has been tremendous novel work on diffusion models since last year, a lot of which we believe are critical for computer vision practitioners. The following are just a few notable examples: Eludicated Diffusion Models (NeurIPS 2022 best paper) provide principles on how one should train diffusion models in an optimal manner. Recent differential equation solvers such as DPM-Solver (NeurIPS 2022 Oral) and DEIS have made notable advances on accelerating sampling from diffusion models. Diffusion inversion techniques such as DreamFusion and Magic3d obtain text-to-3D generation by inverting image diffusion models. Textual inversion and DreamBooth enable the ``personalization'' of text-to-image diffusion models from few-shot supervision signals. There are also numerous recent works that apply diffusion models to other domains, such as 3d representations, videos, and motions. Given the rate of progress, we believe that it is crucial to have a tutorial on diffusion models in CVPR 2023, focusing on more recent developments.

The primary goal of this tutorial is to make diffusion models more accessible to a wider computer vision audience and introduce recent developments in diffusion models. Unlike the previous tutorial, we will streamline the discussion on fundamentals and focus much more on practical methods and applications of diffusion models. We will present successful practices on training and sampling from diffusion models and discuss novel applications that are enabled by diffusion models in the computer vision domain. These discussions will also heavily lean on recent research developments that are released in 2022 and 2023. We hope that this second tutorial on diffusion models will attract more computer vision practitioners interested in this topic to make further progress in this exciting area.