In August, IJAC published an invited review Deep Learning Based Single Image Super-resolution: A Survey, by Prof. Amir Hussain, professor of Edinburgh Napier University. This survey focuses mainly on deep learning-based methods and aims to provide a comprehensive introduction to the field of SISR.
Deep Learning Based Single Image Super-resolution: A Survey
Viet Khanh Ha, Jin-Chang Ren, Xin-Ying Xu, Sophia Zhao, Gang Xie, Valentin Masero, Amir Hussain
Single image super-resolution (SISR) aims to obtain high-resolution (HR) images from a low-resolution (LR) image. It has practical applications in many real-world problems, where certain restrictions present in image or video such as bandwidth, pixel size, scene details, and other factors. Since multiple solutions exist for a given input LR image, SISR is to solve an ill-posed inverse problem.
There are various techniques to solve an SISR problem, which can be classified into three categories, i.e., interpolation-based, reconstruction-based, and example-based methods. The interpolation-based methods are quite straightforward, but they can not provide any additional information for reconstruction and therefore the lost frequency cannot be restored. Reconstruction-based methods usually introduce certain knowledge priors or constraints in an inverse reconstruction problem. The representative priors can be local structure similarity, non-local means, or edge priors. Example-based methods attempt to reconstruct the prior knowledge from a massive amount of internal or external LR-HR patch pairs, in which deep learning techniques have shined new light on SISR.
2. Challenges and trends
Despite the success of deep learning for SISR tasks, there are open research questions regarding SISR model design as discussed below:
1) Need for light structure model: Although deeper is better, most recent SISR models contain no more than a hundred layers due to the overfitting problem. This is because SISR models work on pixel level, which requires many more parameters than that of image classification. As the model is getting deeper, the vanishing gradient is becoming more challenging. This suggests the preference of a light structure model with fewer parameters and less computation.
2) Adapt well to unknown degradation: Most algorithms highly depend on predetermined assumptions that LR images are simply down-sampling from HR images. They were unsuccessful in recovering SR images with big scale factors due to the lack of learnable features on LR images. If noise is present, the accuracy of reconstruction is deteriorated as a result of the increasing ill-posed problems. A good way to feasibly deal with unknown degradation is to use transfer learning or a huge number of training examples. However, there has been little research on this task hence this needs be further investigated.
3) Requirement for different assessment criteria:
No methods can achieve low distortion and good perceptual quality at the same time. The traditional measurements such as L1/L2 loss can help to generate images with low distortion, but there is still considerable disagreement with regard to human perception. In contrast, the integration of perceptual assessment produces more realistic images, but it suffers from low PSNR. Therefore, it is necessary to extend more criteria of assessment for particular applications.
4) Efficiently interpret and exploit prior knowledge to reduce ill-posed problems: Until recently, the deep architecture appears like a black box and we have limited knowledge of why it works and how it works. Meanwhile, most SISR algorithms have introduced different structures or connections based on the experiments, neglecting to explain further on why the result is improved. Another important solution for ill-posed problems is to combine different constraints as regulizers for prediction. For example, the combination of different loss functions, or the use of image segmentation information to constrain reconstructed images. That is why a semantic categorical prior was introduced, attempting to achieve richer and more realistic textures. The simple ways to use more prior knowledge are that we can use MLE as a proxy to incorporate prior knowledge as conditional probability or feed directly into the network whilst forcing parameters sharing for all kinds of inputs.
This survey has reviewed key papers in single image super-resolution that underly example-based learning methods. Among these, we noticed that deep learning based methods have recently achieved state-of-the-art performance. Before going into more detail of each algorithm, the general background in each of the categories was introduced. We have highlighted important contributions of these algorithms, discussed their pros and cons and suggested future work possible either within categories or in designated sections. Up to now, we cannot define which SISR algorithms are the most state-of-theart, as this is highly dependent on applications. For instance, an algorithm which is good for medical imaging or facing processing purposes is not necessarily effective for remote sensing images. The different constraints imposed in a problem indicates a need to generate a benchmark database that specifies the concerns of applications in different fields. Finally, there are outstanding challenges to exploit algorithms in practical applications since they have been mainly applied to standard benchmark datasets and poorly adapted to different scenarios. This survey paper has enhanced the understanding of deep learning based algorithms applied to single image super-resolution, which can be used as a comprehensive guide for the beginner and throws up many questions in need of further investigation.