Remote Sensing Image Registration Based on Improved KAZE and BRIEF Descriptor

Huan Liu Gen-Fu Xiao

Huan Liu, Gen-Fu Xiao. Remote Sensing Image Registration Based on Improved KAZE and BRIEF Descriptor[J]. International Journal of Automation and Computing. doi: 10.1007/s11633-019-1218-3
Citation: Huan Liu, Gen-Fu Xiao. Remote Sensing Image Registration Based on Improved KAZE and BRIEF Descriptor[J]. International Journal of Automation and Computing. doi: 10.1007/s11633-019-1218-3

doi: 10.1007/s11633-019-1218-3

Remote Sensing Image Registration Based on Improved KAZE and BRIEF Descriptor

More Information
    Author Bio:

    Huan Liu received the B. Sc. degree in computer science and technology from Nanjin Institute of Technology, China in 2004, received the M. Sc. degree in software engineering from Jiangxi Normal University, China in 2008, and the Ph. D. degree in pattern recognition and intelligent system from Donghua University, China in 2014. She is currently an associate professor at Jinggangshan University, China. Her research interests include machine vision, image processing and intelligent algorithm. E-mail: liuhuan816618@163.com ORCID iD: 0000-0002-7453-3307

    Gen-Fu Xiao received the B. Sc. and M. Sc. degrees in automation from Nanchang University, China in 1998 and 2005, respectively, and the Ph. D. degree in mechatronic engineering from Nanchang University, China in 2014. He is currently an associate professor in School of Mechanical and Electrical Engineering at Jinggangshan University, China. His research interests include modeling and optimization. E-mail: xiaogenfu@163.com (Corresponding author)ORCID iD: 0000-0003-4116-9358

图(47) / 表(2)
计量
  • 文章访问数:  391
  • HTML全文浏览量:  156
  • PDF下载量:  1
  • 被引次数: 0
出版历程
  • 收稿日期:  2019-09-18
  • 录用日期:  2019-12-18
  • 网络出版日期:  2020-03-06

Remote Sensing Image Registration Based on Improved KAZE and BRIEF Descriptor

doi: 10.1007/s11633-019-1218-3
    作者简介:

    Huan Liu received the B. Sc. degree in computer science and technology from Nanjin Institute of Technology, China in 2004, received the M. Sc. degree in software engineering from Jiangxi Normal University, China in 2008, and the Ph. D. degree in pattern recognition and intelligent system from Donghua University, China in 2014. She is currently an associate professor at Jinggangshan University, China. Her research interests include machine vision, image processing and intelligent algorithm. E-mail: liuhuan816618@163.com ORCID iD: 0000-0002-7453-3307

    Gen-Fu Xiao received the B. Sc. and M. Sc. degrees in automation from Nanchang University, China in 1998 and 2005, respectively, and the Ph. D. degree in mechatronic engineering from Nanchang University, China in 2014. He is currently an associate professor in School of Mechanical and Electrical Engineering at Jinggangshan University, China. His research interests include modeling and optimization. E-mail: xiaogenfu@163.com (Corresponding author)ORCID iD: 0000-0003-4116-9358

English Abstract

Huan Liu, Gen-Fu Xiao. Remote Sensing Image Registration Based on Improved KAZE and BRIEF Descriptor[J]. International Journal of Automation and Computing. doi: 10.1007/s11633-019-1218-3
Citation: Huan Liu, Gen-Fu Xiao. Remote Sensing Image Registration Based on Improved KAZE and BRIEF Descriptor[J]. International Journal of Automation and Computing. doi: 10.1007/s11633-019-1218-3
    • Image registration in remote sensing image processing is a fundamental task. It aligns two or more images captured at different times, by different sensors or from different viewpoints[1]. The accuracy of image registration has a significant effect on many remote sensing analyses, such as image mosaic[2], image fusion[3, 4], change detection[5]. Although remarkable progress has been made in automatic image registration techniques in the last few decades, automatic remote sensing image registration is still a challenging and significant task[6, 7].

      Methods of image registration can be classified into two major categories: area-based methods and feature-based methods[8]. For remote sensing images with complex shooting background and rich but variable information, the former methods based on image intensity values are not suitable in the application of remote sensing images. Feature-based methods conduct salient feature descriptors that can represent high-level information. They are more preferable in remote sensing image registration. Presently some classical feature extraction algorithms include Harris[9], smallest univalve segment assimilation nucleus (SUSAN)[10], scale invariant feature transform (SIFT)[11], speed-up robust features (SURF)[12], oriented FAST and rotated binary robust independent elementary features (BRIEF) (ORB)[13], binary robust invariant salable keypoints (BRISK)[14]. It is well known that SIFT algorithm or its improved versions can detect distinctive invariant texture feature points from remote sensing images[15-17]. Li and Ye[18] proposed a robust sample consensus judging (RSCJ) algorithm embedded into random sample consensus (RANSAC) in order to eliminate outliers without decreasing registration precision. Ma et al.[19] proposed locally linear transformation (LLT) to handle both rigid and nonrigid transformation for removing outliers in remote sensing image. They presented an efficient approach, termed as locality preserving matching (LPM), to maintain the local neighborhood structures of those true matches in feature matching in the literature[20]. Literature [21], an extension study of the previous works [19, 20], puts forward a new method named mTopKRP based on multiscale neighborhoods to preserve the local topological structure strictly for eliminating mismatches caused by nonrigid transformation in remote sensing images and building a mathematical optimization model to establish reliable correspondences. Literature [22] proposed a novel method named P-SIFT for low-altitude remote sensing image registration. It is an improved extension of SIFT and affine-SIFT (ASIFT) to address perspective transformation. Fan et al.[23] proposed an improved SIFT for image registration of optical and synthetic aperture radar (SAR) images. However, it mainly aimed at solving the images with tiny distortions. When large distortions happened, it had a poor registration effect. Han et al.[24] put forward a scheme combining SIFT and Delaunay for high-resolution remote sensing images of urban areas. The distribution and quantity of feature points are uniform and appropriate. However it is only utilized in remote sensing images contained with buildings and variable objects. MS-SIFT[25] performed mode seeking 4D space, followed by effective pruning of outlying SIFT keypoint correspondences and achieved subpixel accuracy. SAR-SIFT[26] proposed a new SIFT-like descriptor adapted to SAR images, which relied on a new gradient computation. Nevertheless, remote sensing images, especially optical and SAR, have larger nonlinear radiometric differences[27]. When it comes to natural images, SIFT or improved SIFT editions can achieve satisfactory registration results. When the complex nonlinear radiometric changes happen, SIFT or improved SIFT editions are vulnerable.

      To address the difficulties, KAZE, a multi-scale 2D feature detection and registration algorithm based on nonlinear scale space, was first put forward[28]. It is different from the previous approaches based on Gaussian scale space reduction sampling. The classical methodology is easy to cause blurred edges and detail loss, which tends to result in low accuracy. KAZE constructs a scale space through nonlinear diffusion filtering and generates descriptors with M-SURF. The algorithm can retain more edges and details while reducing noise. Hence, it obtains better local precision and discriminability. Fan et al.[29] adopted nonlinear diffusion to establish anisotropic scale space for realizing SAR image registration. Li et al.[30] applied anisotropy diffusion to handle multispectral image registration for increasing the accuracy of detection and matching. It is well known that there are large amounts of data and rich ground details in remote sensing images. The quantity of salient keypoints extracted by KAZE is large, so the computational load is too heavy and the complexity is extremely high.

      In the paper, for the sake of reducing the influence of different categories of feature points, the perception of difference from different size of ground objects under different scale space is adopted between variable and relatively stable ground features. Those objects with a large size correspond to the large feature spots in the scale space, and vice versa. A novel nonlinear diffusion function adjusted with the scale factor is worked out. And an anisotropic multi-scale space with composite diffusion is constructed. Our strategy can flexibly extract different categories of objects at different scales. At last, BRIEF a binary coding string descriptor, is applied to improve the matching efficiency.

      The rest of this paper is organized as follows. Section 2 describes the details of a novel nonlinear scheme of remote sensing image registration based on an improved KAZE extraction algorithm and BRIEF. Section 3 illustrates the tests of image extractions and registrations with comparisons to other classical algorithms, followed by some concluding remarks in Section 4.

    • A nonlinear diffusion filter function is constructed in KAZE. The normally used nonlinear diffusion for an image L with spatial coordinates $(x,y)$ and time $t$ is expressed as

      $$\frac{{\partial L}}{{\partial t}} = div\;(c\;(x,y,t) \times \nabla L)$$ (1)

      where $div$ is a divergence operator, $\nabla $ is a gradient operator, L is image brightness, and $c\;(x,y,t)$ is a conductivity function defined in (2).

      $$c\;(x,y,t)= g(\left| {\nabla {L_\sigma }(x,y,t)} \right|{\rm{)}}$$ (2)

      where function $\nabla {L_\sigma }$ is gradient of a Gaussian smoothed version on image L with a standard deviation $\sigma $. The conductivity function g is chosen to promote wide regions. There are two classical equations defined as follows.

      $${g_1} = \exp \left[ - \left(\dfrac{{\left| {\nabla {L_\sigma }} \right|^2}}{k^2}\right)\right]$$ (3)
      $${g_2} =\dfrac{1}{{\dfrac{{1 + {{\left| {\nabla {L_\sigma }} \right|}^2}}}{{{k^2}}}}}$$ (4)

      where $k$ is a contrast factor that regulates the diffusion level related to the marginal information. The smaller the value of $k$ is, the larger the amount of retained edge information will be. ${g_2}$ is usually used in the KAZE algorithm.

      Equation (1) is discretized into (5) according to references[28, 31].

      $$\frac{{{L^{i + 1}} - {L^i}}}{\tau } = \sum\limits_{i = 1}^m {{A_i}({L^n}){L^{n + 1}}} $$ (5)

      where $A({L^n})$ is a matrix of encoding picture, and $\tau $ is a time constant. $m$ means the dimension of image, $m = 2$. ${L^{n + 1}}$ is got as follows.

      $${L^{n + 1}} = \frac{1}{m}\sum\limits_{i = 1}^m {{{[I - m \times \tau \times {A_i}({L^n})]}^{ - 1}}{L^n}} $$ (6)

      where $I$ is an identity matrix.

      In a two-dimension image, (6) can be decomposed into three equations as below.

      $$[2(I - 2\tau {A_1}({L^n}))]L_1^{n + 1} = {L^n}$$ (7)
      $$[2(I - 2\tau {A_2}({L^n}))]L_2^{n + 1} = {L^n}$$ (8)
      $${L^{n + 1}} = L_1^{n + 1} + L_2^{n + 1}$$ (9)

      where ${A_1}({L^n})$ and ${A_2}({L^n})$ are the row and the column directions of encoding image ${L^n}$, respectively. $L_1^{n + 1}$ and $L_2^{n + 1}$ mean the expanded images of ${L^n}$ from the row and the column directions, respectively.

    • According to the main steps of image registration, the construction of nonlinear scale space, feature extraction, feature descriptor and feature matching (termed as image registration in a narrow sense) are included in our procedures. The flow chart is displayed in Fig. 1.

      Figure 1.  Flow chart of our method

      Different ground objects have different physical characteristics, so their own features appearing in digital images are too different. Standard KAZE merely adopts a single conductivity function g in the construction of nonlinear scale space. It neglects the differences expressed in digital images according to different ground objects, especially at different scale spaces. Consequently, it is liable to cause a highly incorrect ratio and low positioning accuracy in the stage of feature extraction.

      In order to decrease the influence of different ground objects in the processing of extraction, the analysis of scale properties of objects in different scale space is conducted in our scheme. Variable objects are often smaller, such as small house, trees, cars, etc. These smaller objects with rich details should appear clearly in the low scale layer of the pyramid scale space and they express small spots. Those stable objects are often larger, such as a larger area of bare lands, lakes, etc. The larger objects should appear distinctly in the high scale layer and they express large spots. As shown in Fig. 2, the center of the circle represents the location of the keypoints, and the radius represents the corresponding scale space. The lower scale space corresponds to the shorter radius, and vice versa. Taking the differences into consideration, our improved KAZE adopted g1 and g2 to construct a compound conductivity function g. g1 and g2 express the different diffusion. The variation of diffusion indicates the difference of image structure. For example, g1 mainly focuses on preserving the information with high contrast in the image edges. g2 preserves the region information in the wide range. The roles g1 and g2 in function g are variable depending on the weight parameters determined by scale factor $\sigma $. This strategy can detect different objects at different scale spaces efficiently, which avoids the high false detection ratio with single g. Our scheme greatly enhances the positioning accuracy, the completeness and correct ratio in the stage of feature extraction.

      Figure 2.  Charter of feature points in multi-scale space (Color versions of the figures in this paper are available online.)

      A new composite diffusion function is redefined according to the function of scale factor taken as the weight coefficient. It can be seen as follows:

      $${g_\sigma } = \frac{{1 - {\sigma _i}}}{{\sum\limits_{i = 1}^n {{\sigma _i}} }}{g_1} + \frac{{{\sigma _i}}}{{\sum\limits_{i = 1}^n {{\sigma _i}} }}{g_2}.$$ (10)

      Small size ground objects can be detected in the low scale layer of pyramid scale space with small $\sigma $. Then g1 has a strong effect with the increase of scale factor $\sigma $. The large size ground objects can be detected in the high scale layer. Then the role g2 plays gradually increases.

      In addition, on the purpose of improving the extraction efficiency, the image pyramid is discretized into a battery of $S$octaves and $O$ sub-levels in the light of SIFT. The octaves and sub-levels are mapped into the corresponding scale $\sigma $ used in (11):

      $$ \begin{split} & {\sigma _i}(o,s) = {2^{o + \frac{s}{S}}}\\ & \;\;\;\;\; o \in [0,1,\cdots ,O - 1],\;\;\;\;s \in [0,1,\cdots,S - 1]\\ & {t_n} = \frac{1}{2}\sigma _n^2,\;\;\;\;n \in [0,1,\cdots,S]. \end{split} $$ (11)

      In the view of efficiency, a pyramid scale space is built by conducting the down sampling operation. Each image represents the first sub-level of the next octave.

    • For saving the running time expended in the construction of nonlinear space, binary robust independent elementary features (BRIEF)[32], a binary code string, is applied to represent features in order to boost the whole efficiency of the matching stage.

      p is a pixel block of $S \times S$ after smoothing. The binary value is defined as

      $$t\;(p;x,y) = \left\{ {\begin{array}{*{20}{c}} {1,\;\;\;{\rm if}\;\;p\;(x) < p\;(y)} \\ {0,\;\;\;{\rm{otherwise}}\;\;\;\;\;\;\;\;\;} \end{array}} \right.$$ (12)

      where $p\;(x)$ is intensity.

      The expression of the binary code string is displayed as

      $${f_n}(p) = \sum\limits_{1 \leq i \leq n} {{2^{i - 1}}} t\;(p;{x_i},{y_i})$$ (13)

      where $2 \times n$ descriptor matrix S can be defined as

      $$S = \left[ {\begin{array}{*{20}{c}} {{x_1}\;{x_2},\cdots,{x_n}} \\ {{y_1}\;{y_2},\cdots,{y_n}} \end{array}} \right].$$ (14)

      Considering the invariance to rotation of feature descriptors, the transformed matrix is obtained according to the main direction $\theta $ of keypoint. The transformed matrix ${R_\theta }$ can be figured out as

      $${R_\theta } = \left[ {\begin{array}{*{20}{c}} {\cos \theta }&{\sin \theta } \\ { - \sin \theta }&{\cos \theta } \end{array}} \right].$$ (15)

      The new steered descriptor matrix can be computed as

      $${S_\theta } = {R_\theta }S = \left[\!\!{\begin{array}{*{20}{c}} {\cos \theta }&{\sin \theta } \\ { - \sin \theta }&{\cos \theta } \end{array}}\!\!\right]\times\left[\!\!{\begin{array}{*{20}{c}} {{x_1}\;{x_2},\cdots,{x_n}} \\ {{y_1}\;{y_2},\cdots,{y_n}} \end{array}}\!\!\right].$$ (16)
    • In the paper, five groups of image dataset are executed in the experiments including remote sensing images with noise from the same sensor, images from different wavebands of the same sensor, images from different sensors, multi-temporal images, and optical-SAR images, respectively. Each group is divided into two parts: feature extraction and feature registration. The comparisons with Harris[9] (Harris_SIFT), SIFT[10], ORB[12] and IKAZE_BRIEF are conducted. The parameter $k$ in conductivity function g is obtained as the 70th percentile of the gradient $\left| {\nabla {L_\sigma }} \right|$ histogram of this smoothed version of the image. The pyramid scale space has four groups, i.e., $\sigma $=1.6, 3.2, 6.4, 12.8. So the corresponding $t$=1.28, 5.12, 20.48, 81.92, respectively. The idea of down sampling in SIFT is adopted in the construction of pyramid space in our algorithm. In the pyramid scale space, each group has five sublevels. Other parameters are assigned as the same as those set in standard KAZE. In the stage of designing descriptor BRIEF, $n$ is set as 128. The parameters of the hardware environment are explained. The processor is Intel(R) Core(TM) i7-6900K, CPU @ 3.2GHz. Memory is 64.0 GB. The program is written under the software environment of 64 bit Window 10 system, MatlabR2014b, Opencv3.4 and Visual Studio 2017.

    • In this group (Group 1), the reference and sensed images are from the same sensor but combined with noise. The image size is $512 \times 512$ pixel. Fig. 3 displays the original images of reference and sensed. Fig. 4 is the diagram of keypoints extraction by SIFT. Figs. 5 (a) and 5 (b) are the initial matching and final matching with RANSAC, respectively. Figs. 6-8 are the extraction results by Harris, ORB and IKAZE, respectively. Figs. 9 (a) and 9 (b) are the initial matching and final matching by Harris_SIFT. Figs. 10 (a) and 10 (b) display the initial and final matching results by ORB, Figs. 11 (a) and 11 (b) display the initial and final matching results by IKAZE_BRIEF.

    • In this group (Group 2), the reference and sensed images are from different wavebands of the same sensor. The image size is $320 \times 320$ pixel. Fig. 12 displays the original reference and sensed images. Figs. 13-16 are the diagrams of keypoints extraction by SIFT, ORB, and IKAZE, respectively. Figs. 17 (a) and 17 (b), Figs. 18 (a) and 18 (b), Figs. 19 (a) and 19 (b), Figs. 20 (a) and 20 (b) are the initial matchings and final matchings with RANSAC by SIFT, Harris_SIFT, ORB, and IKAZE_BRIEF, respectively.

      Figure 3.  Original images of Group 1

      Figure 4.  Keypoints extraction of Group 1 by SIFT

      Figure 5.  Matching images of Group 1 by SIFT

      Figure 6.  Keypoints extraction of Group 1 by Harris

      Figure 7.  Keypoints extraction of Group 1 by ORB

      Figure 8.  Keypoints extraction of Group 1 by IKAZE

      Figure 9.  Matching images of Group 1 by Harris_SIFT

      Figure 10.  Matching images of Group 1 by ORB

      Figure 11.  Matching images of Group 1 by IKAZE_BRIEF

      Figure 12.  Original images of Group 2

      Figure 13.  Keypoints extraction of Group 2 by SIFT

      Figure 14.  Keypoints extraction of Group 2 by Harris

      Figure 15.  Keypoints extraction of Group 2 by ORB

      Figure 16.  Keypoints extraction of Group 2 by IKAZE

      Figure 17.  Matching images of Group 2 by SIFT

      Figure 18.  Matching images of Group 2 by Harris_SIFT

      Figure 19.  Matching images of Group 2 by ORB

      Figure 20.  Matching images of Group 2 by IKAZE_BRIEF

    • In this group (Group 3), the reference and sensed images are from different sensors. The image size is $256 \times 256$ pixel. Fig. 21 displays the original reference and sensed images. Figs. 22-25 are the diagrams of keypoints extraction by SIFT, Harris, ORB, and IKAZE, respectively. Figs. 26 (a) and 26 (b), Figs. 27 (a) and 27 (b), Figs. 28 (a) and 28 (b), Figs. 29 (a) and 29 (b) are the initial matchings and final matchings with RANSAC by SIFT, Harris_SIFT, ORB, and IKAZE_BRIEF, respectively.

      Figure 21.  Original images of Group 3

      Figure 22.  Keypoints extraction of Group 3 by SIFT

      Figure 23.  Keypoints extraction of Group 3 by Harris

      Figure 24.  Keypoints extraction of Group 3 by ORB

      Figure 25.  Keypoints extraction of Group 3 by IKAZE

      Figure 26.  Matching images of Group 3 by SIFT

      Figure 27.  Matching images of Group 3 by Harris_SIFT

      Figure 28.  Matching images of Group 3 by ORB

      Figure 29.  Matching images of Group 3 by IKAZE_BRIEF

    • In this group (Group 4), the reference and sensed images are multi-temporal. The image size is $400 \times 400$ pixel. Fig. 30 displays the original images of reference and sensed. Figs. 31-34 are the diagrams of keypoints extraction by SIFT, Harris, ORB, and IKAZE, respectively. Figs. 35 (a) and 35 (b), Figs. 36 (a) and 36 (b), Figs. 37 (a) and 37 (b), Figs. 38 (a) and 38 (b) are the initial matchings and final matchings with RANSAC by SIFT, Harris_SIFT, ORB, and IKAZE_BRIEF, respectively.

      Figure 30.  Original images of Group 4

      Figure 31.  Keypoints extraction of Group 4 by SIFT

      Figure 32.  Keypoints extraction of Group 4 by Harris

      Figure 33.  Keypoints extraction of Group 4 by ORB

      Figure 34.  Keypoints extraction of Group 4 by IKAZE

      Figure 35.  Matching images of Group 4 by SIFT

      Figure 36.  Matching images of Group 4 by Harris_SIFT

      Figure 37.  Matching images of Group 4 by ORB

      Figure 38.  Matching images of Group 4 by IKAZE_BRIEF

    • In this group (Group 5), the references are optical images and the sensed are SAR. The image size is $509 \times 196$ pixel. Fig. 39 displays the original images of reference and sensed. Figs. 40-43 are the diagrams of keypoints extraction by SIFT, Harris, ORB, and IKAZE, respectively. Figs. 44 (a) and 44 (b), Figs. 45 (a) and 45 (b), Figs. 46 (a) and 46 (b), Fig. 47 (a) and 47 (b) are the initial matchings and final matchings with RANSAC by SIFT, Harris_SIFT, ORB, and IKAZE_BRIEF, respectively.

      Figure 39.  Original images of Group 5

      Figure 40.  Keypoints extraction of Group 5 by SIFT

      Figure 41.  Keypoints extraction of Group 5 by Harris

      Figure 42.  Keypoints extraction of Group 5 by ORB

      Figure 43.  Keypoints extraction of Group 5 by IKAZE

      Figure 44.  Matching images of Group 5 by SIFT

      Figure 45.  Matching images of Group 5 by Harris_SIFT

      Figure 46.  Matching images of Group 5 by ORB

      Figure 47.  Matching images of Group 5 by IKAZE_BRIEF

      As seen from Figs.4–11, 13–20, 22–29, 31–38, 40–47, the quantity of points extracted by SIFT is largest. However the distribution is redundant. The number of keypoints detected by IKAZE is fewer than detected by SIFT. And the distribution obtained by IKAZE is uniform, which can well retain edges and details. With the exception of Test 1, the amount of keypoints detected by Harris are fewer than those detected by ORB in total. And the distributions of the two algorithms are all uneven. For example, as can be seen from ORB, the distribution of the salient points is mainly concentrated in the center of the image area. In the quantitative analysis of Table 1, p_num means the quantity of feature points detected by feature extraction algorithm. t (ms) means the running time consumed in extraction. rep (%) is repeatability between reference and sensed images. The higher the repeatability is, the better the performance of the algorithm is. According to the data in Table 1, rep (%) calculated by IKAZE is up to about 80%, followed by SIFT and ORB. The worst results are computed by Harris. The best value gained by IKAZE in Test 1 goes up to about 83%. The best values gained by SIFT, ORB and Harris in Test 1 are about 75%, 71%, 67%, respectively. In terms of time, running time consumed by ORB is shortest in Test 1, Test 2 and Test 5. Time consumed in Test 3 and Test 4 by ORB is close to it computed by Harris. The running time obtained by IKAZE is longest, followed by SIFT. The time taken by SIFT is 2 times than it calculated by ORB. The time expended by IKAZE is 9 times than time that computed by SIFT. In SIFT, it take some time to constructs linear scale space through a Gaussian filter. Harris needs to compute the gradient and evaluate the response function for every salient point. ORB merely needs to compare gray values in the neighborhood. ORB expended the least time in the three approaches. Nonlinear scale space increases a great amount of time cost in IKAZE. From the analysis of feature matching, Table 2 gives the comparison among parameters of four methods, C (%) is the ratio of correct registration. RMSE is the accuracy and T is the time consumption expended in registration. The larger the C (%) is, the higher the rate of registration is. While the smaller RMSE is, the more accurate the matching is. As seen from Table 2, in all of the tests, the values of correct rate got by IKAZE_BRIEF are best which can be up to 80%. The best value can be up to about 81% in Test 1. In Test 4, the lowest value can still maintain about 75%. SIFT follows IKAZE-BRIEF. The best value computed by SIFT in Test 1 is about 66%. The lowest value from Test 5 is below 60%. The results of ORB and Harris_SIFT are not satisfied. The values of RMSE reflect the effect in accord with the values of correct rate. Root mean square error (RMSE) from IKAZE_BRIEF is the lowest, followed by SIFT, Harris_SIFT and ORB in turn. Hence from the view of registration precision, IKAZE_BRIEF performs excellently. From the viewpoint of efficiency, time-consumption taken by ORB is lowest. ORB can achieve practical effects. There are extreme advantages of registration in ORB and our scheme due to their binary code string. From the comprehensive view, our scheme has better efficiency with high accuracy.

      Table 1.  Parameters of extraction in the three methods

      Methods SIFT Harris ORB IKAZE
      p_num t (ms) rep (%) p_num t (ms) rep (%) p_num t (ms) rep (%) p_num t (ms) rep (%)
      Test 1 Reference 3 482 263.35 75.14 1 603 162.59 66.90 1 412 133.08 71.50 2 283 2 636.84 83.72
      Sensed 4 622 332.96 1 491 167.38 1 351 100.89 2 263 2 599.04
      Test 2 Reference 2 109 102.61 71.31 821 109.89 61.07 1 791 70.43 67.46 1 408 915.56 82.05
      Sensed 2 979 149.14 879 123.30 1 725 64.81 1 423 1 021.56
      Test 3 Reference 2 530 112.04 62.37 724 85.38 55.23 1 257 84.50 56.27 1 065 856.45 83.51
      Sensed 2 566 114.08 651 83.00 1 280 90.02 1 138 961.35
      Test 4 Reference 3 248 205.54 61.52 1 480 139.62 54.11 1 790 136.35 55.18 2 477 2 243.14 81.88
      Sensed 3 449 213.81 1 232 128.73 1 763 130.19 2 287 2 193.03
      Test 5 Reference 1 124 93.40 60.24 569 78.27 50.67 796 63.85 53.14 276 847.82 82.03
      Sensed 1 191 102.82 693 96.44 911 71.82 316 958.60

      Table 2.  Performance comparisons of registration in the three methods

      Methods RMSE Harris_SIFT ORB IKAZE_BRIEF
      C (%) RMSE T (s) C (%) RMSE T (s) C (%) RMSE T (s) C (%) RMSE T (s)
      Test 1 66.32 8.22 1119.80 65.09 9.27 869.14 64.78 10.43 649.53 81.49 4.14 147.06
      Test 2 64.36 9.73 377.29 63.41 10.23 312.66 63.17 11.02 237.98 80.69 4.87 91.89
      Test 3 61.65 13.14 444.41 61.02 13.24 346.16 60.88 13.89 276.42 78.52 5.20 79.05
      Test 4 61.53 13.25 849.02 58.25 14.08 782.44 56.44 15.50 592.93 75.80 6.35 228.61
      Test 5 57.29 14.66 148.13 55.27 15.11 161.32 54.36 15.98 208.35 76.13 6.22 45.16
    • In this article, a novel remote sensing image registration method based on a nonlinear multi-scale space and simple binary descriptor is presented to find more feature matches and improve matching precision. There are two critical contributions: 1) The difference of size between variable and relatively stable ground objects appears dissimilar in different scale spaces, so a composite nonlinear diffusion filter is put forward for aiming to extract reliable and salient feature points at different scale layers in multi-scale space. 2) On the purpose of improving the efficiency, a gradually reducing resolution is adopted in building the multi-scale pyramid space. In addition, a binary code string serves as the feature descriptor. The aforementioned two creations can increase the number of reliable feature points and restrict the mismatches simultaneously without too much efficiency loss. Image registration tests show that our method outperforms other classical methods under most conditions on the whole.

      There are still some limitations. On the one hand, the structure of nonlinear scale space is complicated. As a result, the time expended in the whole process is much more. Our method is inefficient applied in practical applications especially in remote sensing images containing huge data. In order to play a positive role in remote sensing images, the real-time performance of our algorithm needs to be improved. On the other hand, due to the diversity of data sources in remote sensing images, image signal is often affected by large dynamic changes, brightness changes, geometric changes, many similar features, small overlap areas and other factors. It is difficult to solve all the problems by relying on a single feature descriptor and a single similarity measure. Therefore, how to effectively combine multi-feature and multi-measure in remote sensing image registration is a further goal in our future research.

    • This work was supported by National Nature Science Foundation of China (Nos. 61640412 and 61762052), the Natural Science Foundation of Jiangxi Province (No. 20192BAB207021), the Science and Technology Research Projects of Jiangxi Province Education Department (Nos. GJJ170633 and GJJ170632).

参考文献 (32)

目录

    /

    返回文章
    返回