Image Registration by Nonlinear Wavelet Compression and Singular Value Decomposition

Jorge E. Pinzon1, John F. Pierce2, Susan L. Ustin2 and Claudia M. Castaneda2
1Department of Mathematics
2Department of Land, Air, and Water Resources
University of California at Davis
KT-Tech, Inc., Greenbelt, MD
Image Registration Workshop
NASA GSFC
Nov 20 - 21, 1997
We present a three step algorithm with feedback for registering images by global, two-dimensional, rigid transformations (translation, rotation, uniform scaling). The algorithm registers images by extracting a significant number of matching points from each an ensemble (control points) using non-linear wavelet compression. It then generates the registering transformation from these ensembles using tools from the theory of singular value decomposition (SVD) for matrices.

1. Outline

1.  System Diagram and Program Overview 2. The SVD Computations and Registration Principles
3. Random Points and SVD Registration
4. Wavelet Compression and Control Point Extraction
5. Example
6. Validation and Performance Tests
7. Conclusions and Generalizations

We present the overview of the structure of the algorithm, its flow and properties, describing the logical foundation for the algorithm and its three main elements: extraction of control points, SVD and stability.

First, we introduce the elements of the SVD and how they were used with the ensembles of control points from the reference and input images to assess the registering transformation. Second, we evaluate the sensitivity of the system to random false matching points to guarantee stability. We describe how we use the SVD computations to refine the extraction of hot pixels and to determine the registering transformation from them. Third, we describe how we compress and refine each image to produce the ensembles of control points.

Then, we present the results of the validity and performance tests. Finally, we present the conclusions about the shortcomings of the current version of the algorithm and set forth ways by which we will improve it in future versions.

Wavelet and SVD Registration System Diagram

Figure 1

1.2 System Properties

 
  • Method:
  • Design:
  • The algorithm consists of three stages with feedback. The first stage of the algorithm compresses the reference and input images, extracting hot pixels which manifest features significant to each image. The second stage carries out the SVD computations on each of the compressed images to extract singular values and principal directions for each ensemble. The third stage uses the centroid, singular values, and principal directions for each ensemble to compute the registering transformation.

    A stability feedback loop between the second and first stages allows the SVD computations to refine the compression and to maximize the probability that the ensemble of hot pixels in both images manifest viable control points, a significant number of which will match. A feedback loop among the third, second, and first stages refines the accuracy of the registration computed by the third stage.

    We assume that hot pixels which manifest features inherent in an image are not randomly distributed, and that hot pixels manifest artifacts in an image, such as incidental lighting, etc. For a given compression, we design the program to increment the thresholding filter, to produce or remove hot pixels from each compressed image. The program computes the rate of change in the orientation of the principal singular direction, with respect to the addition of hot pixels. If the rate of change is sufficiently non-zero, points manifesting significant features are more likely being added, and the incrementing process should continue. If the rate of change approaches zero, points being added are more likely random, and the compression/thresholding process should terminate.

    This registration method could be classified as a global method according to the registration transformation and variation types. This method also uses all parts of the images to compute the parameters of rigid transformations, therefore it can also be considered as a global method. However, by registering just SVD principal directions which has the advantage of packing the whole information, 1-1 registration does not required speeding, which is generally a costly method.

    Bearing in mind questions regarding local variations, for example, distortions due to perspective: objects which are farther away are distorted differently than closer objects. SOLUTION: piecewise local SVD computations.

    We designed the algorithm to be computationally simple, fast, accurate, reliable, and automatic, to the extent that these requirements are compatible. For example, given a choice between speed and accuracy, we designed the algorithm for speed; given a choice between speed and computational simplicity, we designed the algorithm for simplicity.

    Also, we took the effort to modularize the routines in the algorithm. By modularizing the routines, the user can use the compression and hot point extraction routines (wav_his, ext_hot) to extract control points to drive a different registering algorithm, for example. Alternatively, the user may produce by different means ensembles of control points in a reference and input image, then call the routines svd_mod, affine_r and stat_reg to perform a registration and assess its effectiveness at aligning the sets of control points.

    2. SVD Registration Principles

  • Global, rigid transformation acting on any 2D-subset:
      X e D-space, and Y e I-space 
    (1)
     
  • Isolating Scaling and Rotation
      Coordinates relative to centroids of ensembles: P0, Q0. 
    (2)
     
  • Rigid transformation under permutations
      p: permutation on the index set {1, …, m} 
      In matrix notation 
    (3)
      where
    (4)
    (5)
     
  • Assessing s and R by SVD
    • THEOREM 
      If 
      then
    (6)
      for
    (7)
      PROOF
    (8)
    (9)
    We take a global, rigid transformation acting on any subset in a 2-dimensional Euclidean space as given by equation 1. Here s and Rq represent scale and rotation and the 2 x m matrices, X and Y denote control points for reference and input images, respectively. Columns are the components of the position vectors relative to their center of rotation. We say the ensembles match under the transformation if points of the first are taken onto the second in a one-to-one manner by the transformation.

    When this occurs, we show:

    (1) that we can isolate the rotation and scaling parameters for the transformation by examining the positions of points in each ensemble relative to their respective centroid P0, Q0, which register under the transformation.

    (2) that the singular values and principal singular directions for a matrix are independent of the ordering of the rows of the matrix: Equations 3-5.

    (3) that we can deduce the rotation and scaling parameters from the SVD computations for the two reference and input 2 x m matrices, X and Y built. This is stated in the THEOREM that relates singular values and vectors for the reference and input ensembles.

    PROOF follows from equation 9 and the orthogonal nature of matrices P and R9.

    Once these parameters are computed, we can deduce the translation parameters from the position of the centroid in the input image and the rotated and scaled position of the centroid in the reference image.

    When the ensembles contain points that do not match under registration, there is impact upon the conclusions of the Theorem. However, if the points in the ensembles that do not match under registration are randomly distributed, the SVD computations of the orientation of the principal singular directions and of the singular values are, in the mean, unaffected, as we show next.

    3 Random Points and SVD Registration

     
  • True matching Points: 100
  •  Figure 2
     Figure 3
  • Results
  • Table 1: Ensemble augmented by 20% of randomly generated set of points. Actual values are in parentheses.

    Stats
    scale (2)
    a(60o)
    Tx(40)
    Ty(-35)
    mean
    1.82
    59.92
    37.24
    -45.31
    std
    0.02
    0.49
    2.79
    1.87
    mse
    0.03
    0.24
    15.32
    109.77
    The impact upon the computations of the rotation and scaling parameters under unmatching situations will be, in the mean, minor. Here we demonstrate this assertion empirically. We create a set of 100 true matching points in each ensemble and augment it by 20% of randomly generated set of points. We rotate them by 60 degrees, scale by 2 and translate by (40, -35).

    In this figure, plus(+) denotes true matching points and circles(o) augments 20% random points. In the second figure we show average variation of the principal directions after 100 computations of ensembles augmented by 20% of randomly generated set of points. In all cases, the variation was less than 1 degree.

    The table summarizes the results of many computations of scale factor, rotation, and translation parameters for an ensemble of points transformed as stated above. For the rotation and scale parameters, the results concur with the assertion for the stability Figure.

    However, the computation of the translation parameters, which is based upon centroid computations is more sensitive to the presence of points in one ensemble not matching under registration to points in the other, as Table 1 shows. The problem becomes acute when the ensembles significantly differ, because of cropping of the input image, for example. We address this concern in the conclusion.

    4. Nonlinear Wavelet Compression

    Figure 4
    We use a nonlinear wavelet compression strategy to extract from the reference and input images “hot pixels.” We assume that among them are points which manifest features inherent to each image, and thereby are candidates for control points which would match under the registering transformation. We transform the images using a Fast Wavelet Transform, relative to an orthogonal wavelet basis, in this case, Coiflet. From a histogram of the magnitude of the HL and LH wavelet coefficients, we establish a filter which maintains a significant interval of those non-zero coefficients which exceed an adaptive thresholding percentage of the maximum magnitude of the coefficients, and nullifies all other coefficients. Reconstructing spatial images from the “softly” thresholded coefficients produces compressed, but “cloudy” versions of the reference and input images.

    A hard thresholding of these spatial images, using a filter built from the histogram of the pixel gray level values of these images, allows us to “remove penumbra”, and produce compressed, hot pixel versions of the reference and input images. Incrementing this filter allows us to add or remove hot pixels from the compressed images.

    By itself, this process will not automatically extract ensembles of points which manifest features inherent to the images, and thereby present themselves as candidate control points. However, results of the SVD computations can control the incrementation of the filters in the compression and spatial thresholding steps, and thereby optimize the probability that a significant number of hot pixels in each image do in fact manifest inherent features of the image. This control is the foundation for a stability feedback loop.

    We base the assertion on the empirical results of randomly distributed points.

    5. Example

     Figure 5
     Figure 6

    After wavelet compression, highest coefficients remain as hot points and their principal directions are registered by SVD. By fuzzying hot points, we can calculate statistics for estimated parameters.

    6a. Validation and Performance Tests

    Test Images Table 2: Test results.
    Input
    Results
    Time (secs)
    q0TxTy q Tx Ty Opt. 



    L
    550000 
    050000 
    050604 
    002060
    49.96 
    5.25 
    8.98 
    0.93
    -4.83 
    2.80 
    1.61 
    3.65
    1.12 
    -4.75 
    10.57 
    39.71
    52.02 
    82.93 
    88.38 
    55.60


     
    180000 
    045000 
    040502 
    005000
    26.96 
    1.13 
    5.05 
    -5.35
    9.16 
    41.51 
    9.44 
    40.91
    15.28 
    -8.49 
    0.04 
    -16.54
    84.66 
    66.61 
    54.41 
    33.37




    R
    sa1244 
    sa1250 
    sa1300 
    sa1408 
    sa1411
    -4.89 
    -3.98 
    -3.34 
    -4.64 
    -35.35
    -28.08 
    5.65 
    -0.98 
    21.59 
    8.17
    -6.67 
    14.69 
    -2.43 
    18.27 
    -11.41
    59.47 
    66.22 
    51.91 
    33.53 
    53.71
    We used in our test three different types of images: girl, TM, and AVHRR images. These computations were executed in an alpha dec-workstation.

    In the case of the girl and the TM image, the input images were prepared as specific translations and rotations of the reference images. We present these values in the first column of Table 2 as a six-digit number, two digits each for the angle, horizontal and vertical offsets. The AVHRR scenes listed in the table are images taken by the satellite at different times. In each case, the input image differs from the reference image by almost pure translation. The reference image for the AVHRR scenes is coastline map file for the region available at the CESDIS ftp site.

    For each input image, we present the angle, horizontal and vertical offsets from the reference image in columns two through four. In each case, the scale parameter automatically rounded up to one, as expected.

    In average, it took 1 second to perform actual registrations and 30 seconds to perform wavelet decompositions. The time required to perform the optimized registration is presented in column 5. Although the registration of the GIRL and TM images is good, significant deviations were noticed in the AVHRR images, particularly for sa1411. We attribute this deviations to extensive cloud coverage in the image, as we see next.
     

    6b. Cloud cover in AVHRR images

     Figure 7
    Cloud cover has an impact on the extraction of matching points. The extracted hot points for input and reference images have not relationship.

    7. Conclusions and Generalizations

  • Features
  • Shortcomings
  • Extensions
  • In summary we present a global three-stage algorithm with a feedback technique to automatically register images differing by a global rigid transformation. The significant features of the algorithm are the use of nonlinear wavelet compression to extract control points in each image and the use of singular value decomposition tools to register these points. In contrast to linear wavelet compression, the use of nonlinear wavelet compression greatly filters the number of extraneous or redundant control points. The SVD tools are significant, because they allow us to register ensembles of control points without having to know specifically which points match up pairwise.

    The algorithm has two major shortcomings. First, because wavelet decompositions are neither translation nor rotation invariant, selecting the compression and thresholding filters is intricate. Small variations in the registration parameters do not necessarily result in small changes to the compression and thresholding filters. The stability loop based upon the SVD computations allows us to minimize the production of artifacts at the compression stage and to lessen the impact on the computation of the rotation parameter; however, the algorithm remains vulnerable in the computation of the translation parameter. We are examining how preprocessing the images with a Hough transform or edge detector might mitigate these conditions for this version of the algorithm.

    Second, the images that the algorithm ingests must inherently have within them a sufficient number of control points, sufficiently distributed spatially, in order to function successfully. In particular, the algorithm is noticeably unsuccessful on images present scenes obscured significantly by clouds. This was seen in the validation tests of Section ?? , where the algorithm gives reasonable results for AVHRR images depicting basically cloudless scenes, but unreasonable results for the AVHRR image depicting a cloud filled scene. We anticipate correcting the first shortcoming in the next version of the algorithm by refining the compression stage through the introduction of a steerable pyramid. A steerable pyramid is an overcomplete wavelet transform which produces a representation that is almost translation invariant (subbands are aliasing-free) and rotation invariant (subbands are steerable). We anticipate this tool will stabilize the filter selection processes and result in a more accurate computation of translation parameters. Also a generalized Hough transform as pre-processing may mitigate the first shortcoming.

    Cloud masking as pre-processing: Algorithms based upon fractal cloud models and multi-spectral feature extraction methods appear particularly promising.

    SVD Computations

    SVD decomposition of an m X n matrix R: U and V are unitary matrices and S a diagonal matrix with the eigenvalues of RtR.
     Figure 8
    Each equation is solved by singular value decomposition, whose performance in energy-packing, and avoidance of overfitting problems due to its stability properties in rank-deficient systems. Equation SVD represents the best approximation of R in matrices of lower rank. That is, the best least squared approximation of a matrix R by matrices of lower rank q (q < r), is given by 

    Computationally, SVD is more accurate and stable and in this case its complexity is O(n) where n is the number of control points. This is the principle advantage of using SVD in registration: we don't need a 1-1 registration.

    The SVD Figure summarizes the processes involved at each level of the HFBA approach. A matrix R with 15 x 40 entries is created with 0 everywhere except for the 1 values (highest brightness) as indicated by the picture at top-left of Figure 3.2. This matrix is rank deficient. As it is shown in the top second left of the figure: Computed singular values for the matrix R in semilog scale. Observe that the exact rank of this matrix is given by the drop of the curve. In this case, the rank of R is 10. In the next plots, the first ten matrices B that give the best approximation to R in the 2 (Equation SVD) are presented. Each time more details are added to the approximation until the estimated rank is reached and one can not discriminate between the original matrix and its approximation.

    1998, Center for Spatial Technologies and Remote Sensing (CSTARS)
    University of California, Davis