Abstract
Exemplar-based sketch-to-photo synthesis enables a user to generate a photo-realistic image based on a sketch. Recently, diffusion-based methods have achieved impressive performance on image generation tasks and realized highly-flexible control through text-driven generation or energy functions. However, it is still challenging for a diffusion model to handle sketch images. A sketch only contains a few black pixels while the rest are white, making it hard for diffusion-based methods to generate photo-realistic images with color and texture. In this work, we propose an Inversion-by-Inversion method for exemplar-based sketch-to-photo synthesis, including shape-enhancing inversion and full-control inversion. During the shape-enhancing inversion process, an uncolored photo is generated with the guidance of a shape-energy function. This step is essential for this task to ensure sketch control in shape. During the full-control inversion process, we additionally propose an appearance-energy function to add the color and texture of the exemplar and generate the final RGB photo. Our inversion-by-inversion pipeline does not require task-specific training or trainable hyper-network and can accept different types of exemplars for color and texture control.
Methodology
Overview of our Inversion-by-Inversion Translation via SDE. The blue, green, and orange contour plots represent the distributions of sketch, uncolored photo, and photo, respectively. The movement of the grey dot in the distribution denotes the sketch-to-photo synthesis process of our proposed Inversion-by-Inversion method. In our shape-enhancing inversion step (a), we first perturb the input sketch with the forward process of SDE. Then the inversion process of SDE will gradually remove the noise, and the uncolored photo is synthesized with the shape of the input sketch. During this procedure, we propose the shape-energy function to maintain the structure of the input sketch. After that, we perform the full-control inversion step (b) by first perturbing the uncolored photo and then using SDE inversion to denoise it. During this procedure, we use both the shape-energy function and the appearance-energy function for maintaining the structure of the input sketch and add the appearance (i.e., texture and color) from the given exemplar into the output photo (Best viewed in color).
Citation
@article{xing2023inversion,
title={Inversion-by-Inversion: Exemplar-based Sketch-to-Photo Synthesis via Stochastic
Differential Equations},
author={Xing, Ximing and Wang, Chuang and Zhou, Haitao and Hu, Zhihao and Li, Chongxuan
and Xu, Dong and Yu, Qian},
journal={arXiv preprint arXiv:2308.07665},
year={2023}
}