Make Explict Calibration Implicit:
"Calibrate" Denoiser Instead of The Noise Model

Xin Jin Jia-Wen Xiao Ling-Hao Han Chunle Guo Xialei Liu Chongyi Li Ming-Ming Cheng

VCIP, CS, Nankai University

Arxiv 2023

This is an extension of our LED, ICCV 2023.

Paper Dataset (TBD) Code + Pretrained Models BibTex

🔥 Effectiveness Across Various Network Architectures! 🔥

2 pairs for each ratio + 1.5k iterations = SOTA Performance!

By simply replacing the convolutional operators of other structures with our proposed RepNR Block, LED can be easily migrated to architectures beyond UNet. In Tab. 8, we experimented with Restormer and NAFNet, transformer-based and convolution- based, respectively. Results demonstrating that LED still possesses performance comparable to calibration-based methods. Moreover, relative to the training time of ELD (32h22min / 11h42min), LED (10min26s / 12min6s) requires only 0.5%/1.7% of it.

Experiments on More Architectures

📸 A Brand-New Dataset Spanning Multiple Camera Models!

To further validate the effectiveness of LED across different cameras, we introduce the DarkRAW dataset. Compared to existing datasets, our DarkRAW dataset has the following advantages:

More Details

Multi-Camera Data: To further demonstrate the effectiveness of LED across different cameras ( corresponding to different noise parameters, coordinates C), our dataset includes five distinct models that are not covered in existing datasets. Additionally, DarkRAW includes not only full-frame cameras but also APS-C format cameras with smaller sensor areas, which often exhibit stronger noise characteristics.
Varied Illumination Settings: The dataset contains data under five different illumination ratios ($\times$1, $\times$10, $\times$100, $\times$200 and $\times$300), each representing vary- ing levels of denoising difficulty.
Dual ISO Configurations: For each scene and each illumination setting, there are two different ISO settings. These can be used not only for the finetuning stage of the LED method but also for testing the robustness of the algorithm under different illumination settings.

🎑 More Visual Result 🌃

Abstract

Explicit calibration-based methods have dominated RAW image denoising under extremely low-light environments. However, these approaches are impeded by several critical limitations: a) the calibration process is labor-intensive and time-intensive, b) there is a challenge in transferring denoisers across different camera models, and c) the disparity between synthetic and real noise is exacerbated by elevated digital gain. To address these issues, we introduce a groundbreaking pipeline named Lighting Every Darkness (LED), which is effective regardless of the digital gain or the type of camera sensor used. LED eliminates the need for explicit noise model calibration, instead utilizing an implicit fine-tuning process that allows quick deployment and requires minimal data. Our proposed method also includes structural modifications to effectively reduce the discrepancy between synthetic and real noise, without extra computational demands. It surpasses existing methods in various camera models, including new ones not in public datasets, with just two pairs per digital gain and only 0.5% of the typical iterations. Furthermore, LED also allows researchers to focus more on deep learning advancements while still utilizing sensor engineering benefits

Method

Overview of our LED.
Unlike typical calibration-based algorithms, our method, as shown in the figure, consists of four steps in total:

1. Synthetic Noise Generation.
2. Pre-training the denoiser.
3. Collecting few-shot paired data using the target camera.
4. Fine-tuning the denoiser using the data collected in Step 3.

Compared to calibration-based algorithms, LED uses randomly sampled virtual camera parameters during synthetic noise generation, thereby avoiding the calibration process.
During each iteration of pre-training, a random virtual camera is first selected, and the training is performed using the paired data synthesized with that virtual camera. When the k-th virtual camera is selected, only the k-th CSA (Camera-Specific Alignment) is trained. This approach aims to decouple camera-specific information from noise information.
During Finetuning, the first step is to average all CSAs to enhance generalization capability. After that, an additional branch is added to handle out-of-model noise. It's important to note that the blue 3x3 convolution is frozen during this process.

Note: The network architecture of LED is exactly the same as other methods during deployment! This is possible thanks to the reparameterization technique. Below are the details regarding these aspects.
More detail can be found in our main paper.

Details when deploy

LED's another highlight lies in its final deployment, where the neural network remains completely consistent with other methods! No additional computational overhead is added, thanks to the following reparameterization process.

LED Pre-training Could Boost Existing Methods!

Experiments on Pre-Traning

LED pre-training could boost the performance of other methods. By integrating LED pre-training into various existing calibration-based or paired data-based methods, ELD and SID, our approach facilitates notable enhancements in performance. These improvements are not uniform but rather depend on the difference of the pre-training strategies employed. This proves particularly effective in the industrial applications, where the demands for efficiency are paramount. The strategic application of LED pre-training not only boost the performance of the denoiser but also paves the way for more advanced, adaptable, and efficient denoising.

Discussion on "Why Two Pairs"?

The left figure shows the performance of LED as it varies with the number of few-shot data pairs. Notice that when using only one data pair for fine-tuning, LED's performance is not as high as ELD. However, with two data pairs, LED's performance significantly surpasses that of ELD.

Indeed, this is because the camera's gain and noise variance have a linear relationship, as illustrated in the right graph. With just two pairs of images, LED can effectively learn this linear relationship, as two points are enough to determine a straight line.

However, due to the presence of errors, the horizontal coordinates of these two points need to be sufficiently different. In other words, the two pairs of images used by LED should have a significant difference in their ISO settings at the time of capture (ISO < 500 and ISO > 5000).

More details (including validation experiments) can be found in our main paper.

BibTex


@inproceedings{jin2023lighting,
    title={Make Explict Calibration Implicit: "Calibrate" Denoiser Instead of The Noise Model},
    author={Jin, Xin and Xiao, Jia-Wen and Han, Ling-Hao and Guo, Chunle and Liu, Xialei and Li, Chongyi and Cheng Ming-Ming},
    journal={Arxiv},
    year={2023}
}

Contact

Feel free to contact us at xjin[AT]mail.nankai.edu.cn!

Visitor Count