The extension of NIRS type protocols to intact fresh leaves may prove difficult. Two types of difficulties arise: (1) the NIRS protocols have been developed and validated on uniformly ground, uniformly packed, optically thick layers of dried, ground material whereas, for use with spaceborne sensors, we wish to determine biochemical information from intact fresh leaves and (2) the spectrum of water dominates fresh leaf reflectance, potentially masking the reflectance features detected in dry material (Gates, 1970; Tucker and Garratt, 1977; Curran, 1989; Elvidge, 1990). Recent studies on reflectance properties of fresh leaves have used stepwise multiple regression to select wavelengths that correlate with leaf biochemistry (Curran et al., 1992; ACCP, 1994). Johnson and Billow (1995) examined Douglas fir needles that were grown under various fertilization treatments and found that the first derivative of the fresh leaf spectra was strongly correlated with total nitrogen concentration. Yoder and Pettigrew-Crosby (1995) also found first derivative spectra were the best predictors for nitrogen and chlorophyll for bigleaf maple grown under three fertilization treatments. Jacquemoud et al. (1995) determined that stepwise multiple regression of dried leaf and optically thick sample spectra gave higher correlations than did fresh leaves for nitrogen. Curran et al. (1992) examined fresh leaves of Amaranthus and concluded that spectral overlaps between foliar biochemicals needed to be considered to obtain accurate estimates of concentration.
In this study we used species-diverse sets of intact fresh and dry leaves
to examine the relationship between chemical and spectral characteristics.
We have examined the use of unconstrained and constrained stepwise multiple
linear regressions to predict carbon, nitrogen, lignin, cellulose, water,
and dry weight composition from infrared reflectance. Several data transformations
were examined to determine which explains the greatest amount of the variation
in the chemical data. To test for spurious correlations, we developed an
empirical test to estimate the "baseline" explanation of the variation
when there is no statistical relationship between the chemical composition
and reflectance spectra. The consistency of the wavelengths selected by
stepwise multiple linear regression to changes in dataset composition was
tested by running regressions on five subsets of the one dataset.
Following reflectance measurements, leaf disks were wrapped in aluminum foil, frozen in liquid nitrogen, lyophilized, reweighed and electronically scanned for area measurement. The difference in weight between fresh and dried sample was attributed to water content. An additional 15-20 leaves (bulk samples) of each plant species were wrapped in aluminum foil and frozen in liquid nitrogen for later biochemical analysis. The samples were stored on dry ice during transport, lyophilized, and ground through a 1 m screen using a ball mill grinder. Leaf powders were stored desiccated at -70oC until chemical analyses were performed.
A subset of the lyophilized bulk samples was sent to the laboratory
of Dr. John Aber, University of New Hampshire, for biochemical analysis.
Total carbon and nitrogen were determined with a Perkin-Elmer 2400 CHN
Elemental Analyzer (Norwalk, CT). Cellulose and lignin were determined
by proximate analysis, a technique of sequential extractions in dichloromethane,
water and sulfuric acid. The sulfuric acid fraction represents cellulose
and the residue represents lignin (Effland, 1977).
To estimate the "baseline" coefficient of determination (i.e., the coefficient of determination if there were no linear relationship between reflectance and biochemistry), fifty randomized datasets containing nitrogen concentration and content versus log(1/R), first derivative, second derivative spectra were prepared from the actual datasets for Jasper Ridge fresh, JRC fresh, and JRC dried leaves by randomizing the association between the measured nitrogen concentration or content and the reflectance spectra. Randomizations were done by assigning random numbers to the nitrogen observations, sorting by these random numbers, then assigning the first nitrogen observation to the first spectrum in the actual dataset, the second to the second and so on. These randomized datasets were submitted to stepwise multiple linear regression, and the resulting 50 coefficients of determination were averaged.
An additional dataset, provided by Dr. John Aber and Dr. Mary Martin (described in Bolster et al., 1995) was randomized in the same manner. This dataset contained nitrogen concentrations and reflectance spectra for 186 samples of dried, ground leaves, representing 14 species from the Harvard Forest, Petersham, Massachusetts.
The repeatability of wavelength selection was tested by assembling five randomly chosen subsets of 40 samples from the 63 samples in the JRC fresh leaf dataset. Stepwise multiple linear regressions of nitrogen concentration versus log(1/R) spectra were performed on these subsets. The wavelengths chosen as the best three regressors for each subset were compared for consistency among the five regressions.
All statistical procedures were performed using SAS (SAS Institute Inc.,
Cary, NC).
In most cases in the JRC fresh leaf dataset (Table 4), the first derivative of log(1/R) explained more variation than did log(1/R) or the second derivative of log(1/R). When the data were expressed on a concentration basis (g g-1), the first derivative of log(1/R) explained the largest proportion of the variation for nitrogen, cellulose, and lignin and the second derivative of log(1/R) explained the largest proportion of the variation for carbon. When the data were expressed on a content basis, the first derivative of log(1/R) explained the largest proportion of the variation for carbon, nitrogen, cellulose, and log(1/R) explained the largest proportion of the variation for lignin and dry weight. Log(1/R) and the first derivative both explained 99% of the variation for water content. At least 60% of the variation was explained for all chemicals except for cellulose and lignin concentrations, for which no significant regressions were found. Higher explanations of variation were obtained by expressing the data on the basis of content than by expressing the data on the basis of concentration for all chemicals except nitrogen (Table 4).
In the JRC dry leaf dataset (Table 5), the first or second derivative of log(1/R) consistently explained more of the variation than did log(1/R). When the data were expressed on a concentration basis, the first derivative of log(1/R) explained the largest proportion of the variation for carbon, cellulose, and lignin. The first and second derivatives explained equal amounts of the variation for nitrogen concentration. When the data were expressed on a content basis, the second derivative of log(1/R) explained the largest proportion of the variation for carbon, nitrogen, cellulose, lignin, dry weight, and water. In one case, only 35% of the variation was explained. In all other cases, at least 53% of the variation was explained. Higher explanations of variation were obtained for all chemicals by expressing the data on the basis of concentration than by expressing the data on the basis of content (Table 5).
The wavelengths selected by the unconstrained regressions depended upon
whether the data were expressed on the basis of concentration or content
(for example, nitrogen, Figure 1). Over
all chemicals analyzed, wavelengths chosen for concentration and content
were within 10 nm less than 6% of the comparisons.
Mean coefficients of determination for the Harvard Forest dataset were lower than those for the Jasper Ridge and JRC datasets, but exhibited the same pattern (Table 7). That is, the second derivative of log(1/R) explained the greatest amount of variation (9%), the first derivative explained an intermediate level (5%) and log(1/R) explained the lowest amount (1%). A number of factors may have contributed to lower explanation of variation by random datasets, including the greater species homogeneity in the dataset, lower experimental error due to larger sample numbers, and measurement factors, such as the use of dry ground uniformly packed leaf samples.
For example, for nitrogen concentration, the coefficients of determination (R2) for log(1/R) and the first derivative were higher than the randomized R2s for the three datasets, except for Jasper Ridge first derivative which was not significant. The second derivative R2s were always lower than the randomized R2s. For nitrogen content, three of the six regressions performed on log(1/R) and the first derivative were not significant, two were greater than the randomized R2s, and one was lower.
For lignin concentration, two of the log(1/R) regressions had moderate R2s but one was not significant, and all of the first derivative R2s were less than 0.35.
The results for water content were more consistent, with the two water-containing datasets (Jasper Ridge and JRC fresh) giving high R2s for log(1/R), first derivative and second derivative regressions, but low or nonsignificant regressions for the JRC dry leaf dataset (Table 10).
The wavelengths chosen by stepwise multiple linear regression of log(1/R)
for subsets of the JRC fresh leaf dataset exactly coincided with those
for the entire dataset for one subset (subset 4, Figure
2). The regression for the entire dataset explained 49% of the variation
in nitrogen concentration. The regressions for the subsets explained 41-71%
of the variation in nitrogen concentration. For two subsets, only one of
six selected wavelengths was within 14nm of a selected wavelength for the
entire dataset (subsets 2,3, Figure 2).
A second selected wavelength in each of these two datasets was within 56nm.
For the remaining two datasets, the selected wavelengths were more than
350nm away from those chosen for the entire dataset.
For the two fresh leaf datasets, more of the variation was explained by expressing the data on a content basis than a concentration basis for all chemicals except cellulose in the Jasper Ridge dataset and nitrogen in the JRC dataset (Tables 3, 4). This could have resulted from the fact that the spectrometer beam probes the leaf on an area basis. However, this generalization did not hold true for the dry leaf dataset (Table 5). In addition, the results of the randomization study indicated that this apparent pattern may be partially dependent on random processes because the same pattern occurred in 8 out of 9 comparisons of nitrogen concentration and content (Table 6).
In general, the proportions of variation in chemical concentrations and contents explained by the regressions with the highest R2s were somewhat higher in the Jasper Ridge dataset than in the JRC fresh leaf dataset, possibly due to the smaller number of samples in the Jasper Ridge dataset than in the JRC dataset (Tables 3,4). Lower R2s were generally obtained for the JRC dry leaf dataset than for the JRC fresh leaf dataset. Similarly, the coefficients of determination for dry needle regressions of Douglas fir were slightly inferior to those for fresh needles (Johnson and Billow, 1995). However, Jacquemoud et al. (1995), using reflectance as the input found higher coefficients of determination for dry leaves.
For all three datasets, the bands selected by stepwise multiple linear regressions differed depending upon whether log(1/R), first derivative log(1/R) or second derivative log(1/R) were used (Figure 1). Similarly, Jacquemoud et al. (1995) reported that different bands were selected for the same chemical in the same leaves when data were examined in reflectance or transmission modes. As reported by Curran (1989), selected bands rarely corresponded to known features for the chemical being examined. In addition, band selection was heavily dependent upon the basis on which the chemistry was expressed, with selected bands coinciding (selection of bands within 10nm of one another) in less than 6% of the paired concentration/content regressions (Figure 1). This suggests that band selection by stepwise multiple linear regression on intact fresh leaves may be sensitive to factors other than the absorption characteristics of the chemicals being examined, including scattering due to cell walls and anatomical characteristics (Peterson and Hubbard, 1992) and spectral overlaps caused by the presence of other biochemicals (Curran et al., 1992).
The high coefficients of determination obtained for the randomized datasets suggest that stepwise multiple linear regressions of using log(1/R), first derivative log(1/R) and second derivative log(1/R) "explained" a high proportion of the variation even when no actual relationship existed between the chemical data and the reflectance spectra (Table 6). These coefficients of determination ranged from 41-48% for the first derivative and 57-82% for the second derivative of log(1/R) for the three datasets.
Although the number of wavelengths selected was smaller than the number of samples in the datasets, the initial number of wavelengths was quite large (850). The number of samples in the Jasper Ridge and JRC datasets was relatively small but the JRC dataset is similar in size to several recently reported studies on fresh leaf reflectance and biochemistry (Yoder and Pettigrew-Crosby, 1995; Johnson and Billow, 1995; see Table 8). The number of regressors selected for the JRC dataset meets the criterion given by Hruschka (1987): 5-15 samples for each regression and data treatment constant and for any parameter of the data treatment, such as wavelength, that is allowed to vary. By this criterion, the Jasper Ridge dataset was too small to allow the selection of 6 regressors. However, the patterns observed with the Jasper Ridge dataset were similar to those observed in the JRC dataset. Use of a larger dataset reduced the magnitude of the "baseline" correlation, but did not eliminate it (Table 7).
In any case, when the number of samples is less than the number of initial wavelengths, the number of independent ways in which the wavelengths can vary from sample to sample is limited, a problem termed multicollinearity (Martens and Naes, 1987). Stepwise multiple regression compresses the data to remove this problem, however, the coefficient of determination is inflated by this data compression (Rencher and Pun, 1980; Birth 1985). Birth (1985) gives a formula to calculate the coefficient of determination that can be expected when the true correlation is zero. For 850 uncorrelated independent variables and 63 samples, this coefficient of determination is 0.19. We would expect a higher coefficient of determination from stepwise multiple linear regression because more than one regressor was allowed. The lower values obtained for the log(1/R) regressions (Table 6) suggest that some of the independent variables were correlated with others, reducing the number of truly independent variables in the regression.
In the regressions using correctly paired nitrogen concentration and content data and spectra, the first or second derivative generally explained the greatest amount of the variation (compared to log(1/R)), but because the randomized R2s only exceeded the actual R2s by 2-42%, we question how the R2s for the first and second derivative of log(1/R) should be interpreted. Further caution is suggested since the maximum R2 values obtained from randomized runs of second derivative log(1/R) are near values reported in the literature as significant relationships between chemistry and reflectance. The non-zero randomized R2 values suggest that to obtain statistical confidence, the "baseline" R2 for randomized data should be established before accepting regressions with high R2s as biologically significant.
Prior studies have recommended constraining the regression (ACCP, 1994) by using a priori selected wavelengths. In this study, fitting regressions to the wavelengths identified in five studies of fresh or dried leaf material generally did not provide reliable predictions of nitrogen concentration (Table 8). We conclude that none of these sets of fixed bands provided adequate predictive ability across our datasets.
The results of regressions using wavelengths suggested by theoretical studies were also inconsistent. Some of the coefficients of determination (R2) for log(1/R) and the first derivative log(1/R) regressions on nitrogen concentration and content were higher than the randomized R2s for the three datasets, but others were lower or not significant. All of the R2s for the second derivative log(1/R) were less than the randomized R2s. The R2s for lignin were similarly variable. The results for water content were more consistent, with the two water-containing datasets (Jasper Ridge and JRC fresh) giving high R2s for log(1/R), first derivative and second derivative regressions, but low or nonsignificant regressions for the JRC dry leaf dataset. This supports the view that, in fresh leaves, water provides a stronger signal than nitrogen or lignin (Curran, 1989; Elvidge, 1990).
Additional explanations for the inability of the constrained regressions to consistently relate reflectance spectra to chemical composition in our datasets may be the wide range of species diversity and the use of intact leaves. Typically, NIRS protocols involve developing species-specific relationships for chemistry prediction. Sample preparation is also closely monitored for drying, particle size, and packing density prior to measurement. Following these techniques, Johnson and Billow (1995) and Kupiec and Curran (1994) obtained good results with monospecific datasets of dried, ground foliage samples. Recently however, Bolster et al. (1995) reported good stepwise multiple linear regression predictions from large datasets that included a wide range of species, both broadleaf and conifers, and a variety of plant material (leaves, stems, roots, etc.). Furthermore, the ACCP report (1994) stressed the importance of having a calibration dataset that included the range of possible chemical variation rather than limiting species diversity. With our datasets, it was not possible to test these hypotheses for improving model performance by restricting analyses to more homogeneous samples. It should be kept in mind however, with respect to remote sensing applications, some environmental heterogeneity is often present within a scene. Although the species in the JRC and Jasper Ridge datasets represent a wide range of foliar conditions and adaptations, both datasets were acquired from commonly occurring plants growing within a radius of 1 km of each other. Thus, potential remote sensing techniques need to accommodate the range of species variability within regions, even if site-specific relationships are to be developed.
This study was conducted using single leaf layers that were not optically thick. Jacquemoud et al. (1995) compared reflectance and transmittance on fresh and dry individual leaf and stacked (optically thick) fresh and dry leaves, using five regressors. They reported that the coefficients of determination for single and stacked fresh leaf reflectances were higher for optically thick samples when protein was analyzed, lower when lignin was analyzed and similar when cellulose and starch were analyzed. Because optically thick samples of fresh leaves did not give consistently better results than single leaves, we conclude that the patterns observed in our study cannot be attributed to the use of single leaves rather than optically thick leaf stacks. Furthermore, Jacquemoud et al. (1994) examined the effect of including different numbers of regressors in the stepwise equations. Their results indicate no significant difference among these datasets based on the number of regressors (up to 10), suggesting that the patterns, within the limits of the number of regressors chosen, are not significantly affected by overfitting.
Stepwise multiple linear regressions using artificial datasets assembled by randomizing the association between nitrogen data and reflectance spectra gave R2s of at least 0.41 and as much as 0.82 for the relationship between nitrogen concentration and content vs. first or second derivative log(1/R) (Table 6). The R2s for correctly-paired nitrogen data and first and second derivative log(1/R) only exceeded the average randomized R2s by 0.02-0.42. This suggests that high R2s for stepwise multiple linear regression on datasets containing substantially fewer samples than initial wavelengths must be examined in light of the "baseline" R2s for the chemical being examined. Additional caution is advised when examining species-diverse datasets containing fresh, intact leaf spectra.
The bands selected by stepwise multiple linear regression for a given chemical (nitrogen, carbon, lignin, etc.) and a given spectral transformation (log(1/R), first derivative, second derivative) did not correspond between datasets in this study or with bands selected in other studies (Card et al., 1988; Wessman et al., 1988; Curran, 1989; Bolster et al., 1995; Dungan et al., 1994; Johnson and Billow, 1995; Martin and Aber, 1995; Jacquemoud et al., 1995; Yoder and Pettigrew-Crosby, 1995). Band selection depended on whether the chemical data were expressed on a concentration (g g-1) or content (g m-2) basis (Figure 1). For a given chemical, similar bands were selected on a concentration or a content basis less than 6% of the time. Band selection was also very sensitive to the samples included in the dataset (Figure 2).
Multiple regression using bands identified in other studies to explain nitrogen concentration yielded R2s that were less than the average R2s for artificially constructed randomized datasets in 8 of 15 tests (Tables 6, 8). Multiple regression using bands that represent known absorption characteristics for nitrogen and lignin were inconsistent, yielding high R2s for some chemicals and datasets, but not others (Table 10). The results for water were more consistent, giving high R2s for log(1/R), first and second derivative regressions for the fresh leaf datasets, and low R2s or nonsignificant regressions for the JRC dry leaf dataset.
All of these results suggest caution in the use of stepwise multiple
linear regression on fresh leaf reflectance spectra. Band selection does
not appear to be based upon the absorption characteristics of the chemical
being examined.
Birth, G.S. (1985), Evaluation of correlation coefficients obtained with stepwise regression analysis. Applied Spectroscopy 39: 729-732.
Bolster, K.L., Martin, M.E., Aber, J.D. (1995), Interactions between precision and generality in the development of calibrations for the determination of carbon fraction and nitrogen concentration in foliage by near infrared reflectance. Submitted to Can. J. Forest Res.
Card, D.H., Peterson, D.L., Matson, P.A., and Aber, J.D. (1988), Prediction of leaf chemistry by the use of visible and near infrared reflectance spectroscopy. Remote Sens. Environ. 26:123-147.
Curran, P.J. (1989), Remote sensing of foliar chemistry. Remote Sens. Environ. 30:271-278.
Curran, P.J., Dungan, J.L., Macler, B.A., Plummer, S.E. and Peterson, D.L. (1992), Reflectance spectroscopy of fresh whole leaves for the estimation of chemical concentration. Remote Sens. Environ. 39: 153-166.
Dungan, J.L., Johnson, L., Billow, T., Matson, P., Mazzurco, J., Moen, J., and Vanderbilt, V. (1994), High spectral resolution reflectance of Douglas fir grown under differing fertilization regimes: experiment design and treatment effects. Submitted to Remote Sens. Environ.
Effland, M. (1977), Modified procedure to determine acid-insoluble lignin in wood and pulp. TAPPI 6:10.
Elvidge, C.D. (1990), Visible and near infrared reflectance characteristics of dry plant materials, Int. J. Remote Sens. 11: 1775-1795.
Gates, D.M. (1970), Physical and physiological properties of plants. In Remote Sensing: With Special Reference to Agriculture and Forestry, National Academy of Sciences, Washington., D.C. p. 224-252.
Grossman, Y.L., Sanderson, E.W., and Ustin, S.L. (1994), Relationships between leaf chemistry and reflectance for plant species from Jasper Ridge Biological Preserve, California. Int. Geosci. and Remote Sens. Sym (IGARSS-94), vol. 4, pp. 2357-2359.
Hruschka, W.R. (1987), Data analysis: wavelength selection methods IN Near-infrared Technology in the Agricultural and Food Industries (Williams, P.C., and Norris, K.H., eds.) American Association of Cereal Chemists, Inc., St. Paul, MN. Chapter 3.
Jacquemoud, S., Verdebout, J., Schmuck, G., Andreoli, G., Hosgood, B. and Hornig, S.E. (1994), Investigation of leaf biochemistry by statistics. Int. Geosci. and Remote Sens. Sym. (IGARSS-94).
Jacquemoud, S., Verdebout, J., Schmuck, G., Andreoli, G., and Hosgood, B. (1995), Investigation of leaf biochemistry by statistics. Remote Sens. Environ. (in press)
Johnson, L.F., and Billow, C.R. (1995), Spectroscopic estimation of total nitrogen concentration in Douglas-fir foliage. Int. J. Remote Sens. (in press)
Kupiec, J.A. and P.J. Curran, 1995, Remote sensing of foliar chemistry: Moving from the leaf to the canopy. Submitted.
Marten, G.C., Shenk, J.S., and Barton, F.E. II (eds.) (1989), Near Infrared Reflectance Spectroscopy (NIRS): Analysis Of Forage Quality. U.S. Dept. Agric. Handbook 643:1-96.
Martens, H. and Naes, T. (1987), Multivariate calibration by data compression. IN Near-infrared Technology in the Agricultural and Food Industries (Williams, P.C., and Norris, K.H., eds.) American Association of Cereal Chemists, Inc., St. Paul, MN. Chapter 4.
Martin, M.E., and Aber, J.D. (1995), Determining the chemical composition of fresh leaves using near infrared spectra. Submitted to J. Near Infrared Reflectance Spectroscopy
McLellan, T., Martin, M.E., Aber, J.D., Melillo, J.M., Nadelhoffer, K., and Dewey, B. (1991), Comparison of wet chemistry and near infrared reflectance measurements of carbon-fraction and nitrogen concentration of forest foliage. Can. J. For. Res. 21:1689-1693.
Peterson, D.L., Aber, J.D., Matson, P.A., Card, D.H., Swanberg, N., Wessman, C., and Spanner, M. (1988), Remote sensing of forest canopy and leaf biochemical contents. Remote Sens. Environ. 24: 85-108.
Peterson, D.L., and Hubbard, G.S. (1992), Scientific issues and potential remote-sensing requirements for plant biochemical content. J. Imaging Sci. and Technol. 36: 446-456.
Rencher, A.C. and Pun, F.C. (1980), Inflation of R2 in best subset regression. Technometrics 22:49-53.
Tucker, C.J. and Garratt, M.W. (1977), Leaf optical system modeled as a stochastic process. Appl. Opt. 16: 635-642.
Wessman, C.A., Aber, J.D., Peterson, D.L., and Mellilo, J.M. (1988), Foliar analysis using near infrared reflectance spectroscopy. Can. J. For. Res. 18:6-11.
Wessman, C.A. (1990), Evaluation of canopy chemistry. In Remote Sensing of Biosphere Functioning, Hobbs, R.J. and Mooney H.A., Eds, Springer-Verlag, New York, 135-156.
Williams, P.C., and Norris, K.H., eds. (1987), Near-infrared Technology in the Agricultural and Food Industries, American Association of Cereal Chemists, Inc., St. Paul, MN. 330p.
Yoder, B.J., and Pettigrew-Crosby, R.E. (1995), Predicting nitrogen and chlorophyll from reflectance spectra (400-2500nm) at leaf and canopy scales. Remote Sens. Environ. (in press).