|Photo Upscaling With Diffusion Models|
|Written by David Conrad|
|Sunday, 05 September 2021|
Researchers in Google's Brain Team have shared news of breakthroughs they've made in image super-resolution. There are impressive results from using SR3, a super-resolution diffusion model. Next the team used a cascade approach to generate high-resolution natural images.
In a post titled High Fidelity Image Generation Using Diffusion Models, Jonathan Ho and Chitwan Saharia explain that diffusion models, originally proposed in 2015, offer potentially favorable trade-offs compared to other types of deep generative models. They write:
"Diffusion models work by corrupting the training data by progressively adding Gaussian noise, slowly wiping out details in the data until it becomes pure noise, and then training a neural network to reverse this corruption process. Running this reversed corruption process synthesizes data from pure noise by gradually denoising it until a clean sample is produced. This synthesis procedure can be interpreted as an optimization algorithm that follows the gradient of the data density to produce likely samples".
Their blog post presents the two approaches the Brain Team have been using to push the boundaries of the image synthesis quality for diffusion models. The first is Image Super-Resolution via Iterative Refinement (SR3) the blog post summarizes it:
SR3 is a super-resolution diffusion model that takes as input a low-resolution image, and builds a corresponding high resolution image from pure noise. The model is trained on an image corruption process in which noise is progressively added to a high-resolution image until only pure noise remains. It then learns to reverse this process, beginning from pure noise and progressively removing noise to reach a target distribution through the guidance of the input low-resolution image.
The paper on the technique includes examples of the super-resolution results from low-resolution inputs:
Noting that cascading improves quality and training speed for high resolution data, the researchers went further by using their SR3 models for class-conditional image generation (CDM).
CDM is a class-conditional diffusion model trained on ImageNet data to generate high-resolution natural images. Since ImageNet is a difficult, high-entropy dataset, we built CDM as a cascade of multiple diffusion models. This cascade approach involves chaining together multiple generative models over several spatial resolutions: one diffusion model that generates data at a low resolution, followed by a sequence of SR3 super-resolution diffusion models that gradually increase the resolution of the generated image to the highest resolution.
The Brain Team researchers are positive about the potential of these techniques writing:
“With SR3 and CDM, we have pushed the performance of diffusion models to state-of-the-art on super-resolution and class-conditional ImageNet generation benchmarks. We are excited to further test the limits of diffusion models for a wide variety of generative modeling problems.”
Super Resolution results: (Above) 64×64 → 512×512 face super-resolution, (Below) 64×64 -> 256×256 natural image super-resolution.
or email your comment to: firstname.lastname@example.org
|Last Updated ( Sunday, 05 September 2021 )|