TL;DR: We present Dereflection Any Image (DAI), collecting a high-quality dataset Diverse Reflection Removal (DRR) with diverse reflections and propose a diffusion-based framework enhanced by a progressive training strategy, ensuring stable optimization and robust generalization across a wide range of reflection types.
Upper: Original images with reflections. Bottom: Results generated by our model. The demonstrated scenarios encompass multiple reflection types.
Reflection removal of a single image remains a highly challenging task due to the complex entanglement between target scenes and unwanted reflections. Despite significant progress, existing methods are hindered by the scarcity of high-quality, diverse data and insufficient restoration priors, resulting in limited generalization across various real-world scenarios. In this paper, we propose Dereflection Any Image, a comprehensive solution with an efficient data preparation pipeline and a generalizable model for robust reflection removal. First, we introduce a dataset named Diverse Reflection Removal (DRR) created by randomly rotating reflective mediums in target scenes, enabling variation of reflection angles and intensities, and setting a new benchmark in scale, quality, and diversity. Second, we propose a diffusion-based framework with one-step diffusion for deterministic outputs and fast inference. To ensure stable learning, we design a three-stage progressive training strategy, including reflection-invariant finetuning to encourage consistent outputs across varying reflection patterns that characterize our dataset. Extensive experiments show that our method achieves SOTA performance on both common benchmarks and challenging in-the-wild images, showing superior generalization across diverse real-world scenes.
Visualization of the generalization capability. Our proposed method exhibits robust generalization across real-world scenes including but not limited to glass, plastic materials, water surfaces, digital displays, and even stylized scenes such as anime. We also provide a gradio demo to inference the model on your own images.
r Our model can be applied to various downstream tasks, including semantic segmentation, object detection, depth estimation, and normal estimation. The results demonstrate that our model can effectively remove reflections and restore the original scene, which is beneficial for subsequent tasks.
Our dataset DRR contains a diverse collection of scenes, each accompanied by multiple reflection images. As illustrated in the figure, the ground truth transmission layer is highlighted in red boxes, while the remaining images represent various mixed images. The dataset demonstrates remarkable diversity, encompassing indoor, outdoor, and object-centric scenes. All image pairs maintain high resolution with rich textual details. Compared to existing datasets, our proposed DRR dataset demonstrates significant advantages in three key aspects:
Data collection pipeline of real (above) and synthetic (below) data. Real data is captured by recording videos while rotating a glass panel at various angles, then processed to align mixed images with their ground truth transmission layers. Synthetic data is generated by randomly chosen coefficients and filtered to produce high-quality image pairs.
Our proposed framework. It consists of a U-net with one-step denoising strategy, a ControlNet to input the mixed image processed by the encoder 𝓔, and a cross-latent decoder 𝓓 to mitigate blurriness and preserve details.
The three stages of progressive training. First, we train the ControlNet and the upsampling blocks of the U-Net using the basic one-step diffusion loss. Second, we finetune these components by incorporating the consistent loss. Finally, we train the cross-latent decoder using the image reconstruction loss.
@misc{hu2025dereflection,
title={Dereflection Any Image with Diffusion Priors and Diversified Data},
author={Jichen Hu and Chen Yang and Zanwei Zhou and Jiemin Fang and Xiaokang Yang and Qi Tian and Wei Shen},
year={2025},
eprint={2503.17347},
archivePrefix={arXiv},
primaryClass={cs.CV}
}