We present the overview of the structure of the algorithm, its flow and properties, describing the logical foundation for the algorithm and its three main elements: extraction of control points, SVD and stability.
First, we introduce the elements of the SVD and how they were used with the ensembles of control points from the reference and input images to assess the registering transformation. Second, we evaluate the sensitivity of the system to random false matching points to guarantee stability. We describe how we use the SVD computations to refine the extraction of hot pixels and to determine the registering transformation from them. Third, we describe how we compress and refine each image to produce the ensembles of control points.
Then, we present the results of the validity and performance tests. Finally, we present the conclusions about the shortcomings of the current version of the algorithm and set forth ways by which we will improve it in future versions.
Wavelet and SVD Registration System Diagram
A stability feedback loop between the second and first stages allows the SVD computations to refine the compression and to maximize the probability that the ensemble of hot pixels in both images manifest viable control points, a significant number of which will match. A feedback loop among the third, second, and first stages refines the accuracy of the registration computed by the third stage.
We assume that hot pixels which manifest features inherent in an image are not randomly distributed, and that hot pixels manifest artifacts in an image, such as incidental lighting, etc. For a given compression, we design the program to increment the thresholding filter, to produce or remove hot pixels from each compressed image. The program computes the rate of change in the orientation of the principal singular direction, with respect to the addition of hot pixels. If the rate of change is sufficiently non-zero, points manifesting significant features are more likely being added, and the incrementing process should continue. If the rate of change approaches zero, points being added are more likely random, and the compression/thresholding process should terminate.
This registration method could be classified as a global method according to the registration transformation and variation types. This method also uses all parts of the images to compute the parameters of rigid transformations, therefore it can also be considered as a global method. However, by registering just SVD principal directions which has the advantage of packing the whole information, 1-1 registration does not required speeding, which is generally a costly method.
Bearing in mind questions regarding local variations, for example, distortions due to perspective: objects which are farther away are distorted differently than closer objects. SOLUTION: piecewise local SVD computations.
We designed the algorithm to be computationally simple, fast, accurate, reliable, and automatic, to the extent that these requirements are compatible. For example, given a choice between speed and accuracy, we designed the algorithm for speed; given a choice between speed and computational simplicity, we designed the algorithm for simplicity.
Also, we took the effort to modularize the routines in the algorithm.
By modularizing the routines, the user can use the compression and hot
point extraction routines (wav_his, ext_hot) to extract control points
to drive a different registering algorithm, for example. Alternatively,
the user may produce by different means ensembles of control points in
a reference and input image, then call the routines svd_mod, affine_r and
stat_reg to perform a registration and assess its effectiveness at aligning
the sets of control points.
|
|
(1)
|
|
|
(2)
|
|
In matrix notation |
(3)
|
|
|
(4)
|
|
|
(5)
|
If |
(6)
|
|
|
(7)
|
|
|
(8)
|
|
|
(9)
|
When this occurs, we show:
(1) that we can isolate the rotation and scaling parameters for the transformation by examining the positions of points in each ensemble relative to their respective centroid P0, Q0, which register under the transformation.
(2) that the singular values and principal singular directions for a matrix are independent of the ordering of the rows of the matrix: Equations 3-5.
(3) that we can deduce the rotation and scaling parameters from the SVD computations for the two reference and input 2 x m matrices, X and Y built. This is stated in the THEOREM that relates singular values and vectors for the reference and input ensembles.
PROOF follows from equation 9 and the orthogonal nature of matrices P and R9.
Once these parameters are computed, we can deduce the translation parameters from the position of the centroid in the input image and the rotated and scaled position of the centroid in the reference image.
When the ensembles contain points that do not match under registration,
there is impact upon the conclusions of the Theorem. However, if the points
in the ensembles that do not match under registration are randomly distributed,
the SVD computations of the orientation of the principal singular directions
and of the singular values are, in the mean, unaffected, as we show next.
Table 1: Ensemble augmented by 20% of randomly generated set of points. Actual values are in parentheses.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
In this figure, plus(+) denotes true matching points and circles(o) augments 20% random points. In the second figure we show average variation of the principal directions after 100 computations of ensembles augmented by 20% of randomly generated set of points. In all cases, the variation was less than 1 degree.
The table summarizes the results of many computations of scale factor, rotation, and translation parameters for an ensemble of points transformed as stated above. For the rotation and scale parameters, the results concur with the assertion for the stability Figure.
However, the computation of the translation parameters, which is based
upon centroid computations is more sensitive to the presence of points
in one ensemble not matching under registration to points in the other,
as Table 1 shows. The problem becomes acute when the ensembles significantly
differ, because of cropping of the input image, for example. We address
this concern in the conclusion.
A hard thresholding of these spatial images, using a filter built from the histogram of the pixel gray level values of these images, allows us to “remove penumbra”, and produce compressed, hot pixel versions of the reference and input images. Incrementing this filter allows us to add or remove hot pixels from the compressed images.
By itself, this process will not automatically extract ensembles of points which manifest features inherent to the images, and thereby present themselves as candidate control points. However, results of the SVD computations can control the incrementation of the filters in the compression and spatial thresholding steps, and thereby optimize the probability that a significant number of hot pixels in each image do in fact manifest inherent features of the image. This control is the foundation for a stability feedback loop.
We base the assertion on the empirical results of randomly distributed
points.
After wavelet compression, highest coefficients remain as hot points
and their principal directions are registered by SVD. By fuzzying hot points,
we can calculate statistics for estimated parameters.
|
|
|
|
|||
| q0TxTy | q | Tx | Ty | Opt. | |
| G
I R L |
550000
050000 050604 002060 |
49.96
5.25 8.98 0.93 |
-4.83
2.80 1.61 3.65 |
1.12
-4.75 10.57 39.71 |
52.02
82.93 88.38 55.60 |
| T
M |
180000
045000 040502 005000 |
26.96
1.13 5.05 -5.35 |
9.16
41.51 9.44 40.91 |
15.28
-8.49 0.04 -16.54 |
84.66
66.61 54.41 33.37 |
| A
V H R R |
sa1244
sa1250 sa1300 sa1408 sa1411 |
-4.89
-3.98 -3.34 -4.64 -35.35 |
-28.08
5.65 -0.98 21.59 8.17 |
-6.67
14.69 -2.43 18.27 -11.41 |
59.47
66.22 51.91 33.53 53.71 |
In the case of the girl and the TM image, the input images were prepared as specific translations and rotations of the reference images. We present these values in the first column of Table 2 as a six-digit number, two digits each for the angle, horizontal and vertical offsets. The AVHRR scenes listed in the table are images taken by the satellite at different times. In each case, the input image differs from the reference image by almost pure translation. The reference image for the AVHRR scenes is coastline map file for the region available at the CESDIS ftp site.
For each input image, we present the angle, horizontal and vertical offsets from the reference image in columns two through four. In each case, the scale parameter automatically rounded up to one, as expected.
In average, it took 1 second to perform actual registrations and 30
seconds to perform wavelet decompositions. The time required to perform
the optimized registration is presented in column 5. Although the registration
of the GIRL and TM images is good, significant deviations were noticed
in the AVHRR images, particularly for sa1411. We attribute this deviations
to extensive cloud coverage in the image, as we see next.
The algorithm has two major shortcomings. First, because wavelet decompositions are neither translation nor rotation invariant, selecting the compression and thresholding filters is intricate. Small variations in the registration parameters do not necessarily result in small changes to the compression and thresholding filters. The stability loop based upon the SVD computations allows us to minimize the production of artifacts at the compression stage and to lessen the impact on the computation of the rotation parameter; however, the algorithm remains vulnerable in the computation of the translation parameter. We are examining how preprocessing the images with a Hough transform or edge detector might mitigate these conditions for this version of the algorithm.
Second, the images that the algorithm ingests must inherently have within them a sufficient number of control points, sufficiently distributed spatially, in order to function successfully. In particular, the algorithm is noticeably unsuccessful on images present scenes obscured significantly by clouds. This was seen in the validation tests of Section ?? , where the algorithm gives reasonable results for AVHRR images depicting basically cloudless scenes, but unreasonable results for the AVHRR image depicting a cloud filled scene. We anticipate correcting the first shortcoming in the next version of the algorithm by refining the compression stage through the introduction of a steerable pyramid. A steerable pyramid is an overcomplete wavelet transform which produces a representation that is almost translation invariant (subbands are aliasing-free) and rotation invariant (subbands are steerable). We anticipate this tool will stabilize the filter selection processes and result in a more accurate computation of translation parameters. Also a generalized Hough transform as pre-processing may mitigate the first shortcoming.
Cloud masking as pre-processing: Algorithms based upon fractal cloud
models and multi-spectral feature extraction methods appear particularly
promising.
Computationally, SVD is more accurate and stable and in this case its complexity is O(n) where n is the number of control points. This is the principle advantage of using SVD in registration: we don't need a 1-1 registration.
The SVD Figure summarizes the processes involved at each level of the HFBA approach. A matrix R with 15 x 40 entries is created with 0 everywhere except for the 1 values (highest brightness) as indicated by the picture at top-left of Figure 3.2. This matrix is rank deficient. As it is shown in the top second left of the figure: Computed singular values for the matrix R in semilog scale. Observe that the exact rank of this matrix is given by the drop of the curve. In this case, the rank of R is 10. In the next plots, the first ten matrices B that give the best approximation to R in the 2 (Equation SVD) are presented. Each time more details are added to the approximation until the estimated rank is reached and one can not discriminate between the original matrix and its approximation.