Analysis II: Supervised Spectral Classification


After a sequence of images at certain wavelengths have been extracted from the video data, it is possible to perform a supervised spectral classification, which uses the differing spectral responses of objects in a scene to distinguish them. For the AirSpex project the classification functionality built into the Image Calculator was used, although a myriad other software packages, mostly not free, are made specifically for this purpose.

The mathematical formulation of this type of classification technique is called Baysean Maximum Likelihood Classification, and has been thoroughlly developed in statistical decision theory and later applied to image classifications. For the sake of brevity, the background and formalism of this technique will not be presented here.

Training Set

The first task in attempting this type of image classification is the demarcation of a number of different regions of the image that are known a priori to correspond to a specific type of land cover, e.g. snow, water, infrastructure, etc. A judicious creation of this training set involves selections of classes which will have desparate histograms over all of the spectral bands chosen, dispite the fact that very similar classes (or subclasses) may be physically very similar. For example, the presence of both dirty snow and relatively clean snow may warrant separate classes for the two, since the probability distribution function (PDF) consisting of both clean and varying degrees of dirty snow cannot be assumed to be gaussian or perhaps even a superposition of gaussians. A careful selection of a training set will result in PDFs which are distinct and numerically separated; for the selected hyperspectral data acquired by the AirSpex 2005 instrument suite, this has proven to be extremely difficult.


Figure 9. Establishing a training set. For this data set, 11 classes are defined as follows: AsphaltRoof (reddish-brown), DirtySnow (chocolate), OrangeRoof (orange), RedRoof (red), Road (black), RoadII (dark green), RocksShadow (olive), RocksSunlight (cobalt blue), SeaIce (bright blue), SnowShadow (pink), SnowSunlight(yellow). Fifteen spectral bands were used, beginning at 475-nm and continuing in 250-nm steps to 775-nm.

The training set shown in Figure 9 depicts the boundaries of the defined classes. They were decided upon after a trial-and error process using both more and less classes than shown here. Generally speaking, two few classes produces PDFs which are clearly not numerically separated, while too many usually reaults in two oe more classes overlapping or misidentifying land cover. One example of the latter problem was the use of a class called 'Hangar' which represented the bright white storage hangars (located to the right of the reddish-brown rectangles used to designate the class AsphaltRoof. Owing to the high reflectivity and curved shape of the objects, much of the snowy parts of the scene ended up being classified as Hangar as well; thus the class was removed since there are only a handful of these objects in the scene. The classes defined here are not found to be perfect as will be shown.

Classification Maps

The classification results are shown below in Figures 10 and 11. It was performed twice: in Figure 10 the option to allow unclassified pixels was turned off, while in Figure 11 it was left on. Allowing for unclassified pixels lets the classification procedure chose to leave a pixel unclassified if it does not lie within a predifined threshold of any of the determined PDFs from the training classes.


Figure 10. Baysean Maximum Likelihood Classification using 11 classes and 15 spectral samples. This image has not been cleaned by the Image Calculator software.


Figure 11. Baysean Maximum Likelihood Classification using 11 classes and 15 spectral samples. Allowing unclassified pixels was inhibited. The Image Calculator Clean function was applied.

From Figure 10 it appears that while a significant amount of landcover has been successfully classified, note that inspecting the histogram of the plot shows that 53.3% of pixels that were not classified. This is especially apparent in the large section of unclassified pixels directly above the legend marker for Asphalt Roof. Comparison with the generated RGB image in Figure 7 shows this section to be a region of snow lying outside of the sharp boundry of the mountain shadow, yet it clearly has a lower intensity. It is likely that the reduced intensity is owing to a small bit of cloudcover, and not belonging to DirtySnow or RocksSunlight as is indicated by Figure 11. In hindsight, an additional class might be useful here. Also, a class for buildings in shadow was not defined, resulting in those at the top left of the scene to be classified as RocksShadow and RocksSunlight.

Furthermore, the snow in the shadows of sunlit buildings in the scene are almost entirely misclassified as RocksSunlight. This may be partially attributed to the small training area of RocksSunlight as shown in Figure 9, This also may be owing to the demarcation of SnowShadow only being in the nearby mountain's shadow (left fifth of image) and not elsewhere in the scene.

Finally, prohibiting unclassified pixels, as shown in Figure 11, performs much better at detecting buildings whose roofs should be classified as RedRoof and OrangeRoof. While this may be the case, it is still instructive to first run the analysis while allowing unclassified pixels. Doing this first can show the user any areas or landcover types that may have been overlooked.

Analysis

Aside from the classification maps as shown above, accuracy of the classification can be further investigated using other ouput from the Image Claculator classification process. The output is shown below.


Figure 12. Mean Spectra and Standard Deviation (at 475-nm) for the data set. Spectral bandpass at 5.5-nm.

Figure 12 shows, for each spectral sample, the mean intensity over the entire image. Unfortunately the standard deviation is only shown when the user moves the pointer over a spectral sample. In the case of this figure, the 475-nm sample shows standard deviations for each class. At shorter wavelengths, we see that the classes SnowSunlight and SeaIce are the most intense with OrangeRoof and RedRoof being dimmest, as expected. At longer wavelelgths we see the classes OrangeRoof and RedRoof increase in intensity, owing to their color. Standard deviations are more easily visualized using the gaussian-fitted PDFs shown below.


Figure 13. Cluster Plot of 475-nm intensity vs. 775-nm intensity for the entire image.

It is also instructive to inspect the cluster plot generated by the Image Calculator. Figure 13 shows how much scatter exists within each class, when the digital intensities for two images are plotted and grouped by class. Generally, there exist many classes which overlap considerably in this plot, especially AsphaltRoof, RocksSunlight, Road and RoadII (if they were at all visible in the plot), and DirtySnow to a lesser extent. Note that in every attempt, the selection of the training set for these data has always resulted in some overlapping classes, as the drawing of class boundaries is done by hand. More effective identification of classes and features in the images would involve a methodical segmentation and thresholding process, which the participants in this project simply did not have the time to do.

A small amount of scatter (or area in the plot) indicates an accurate designation of the training data for that class and results in a small standard deviation. The obvious example of this in the figure is for the class SnowShadow, which had one of the larger training areas (cf. Figure 9). Of course, if the training area for this class was extended to shadows of sunlight buildings which were mistakenly classified by the neighboring class RocksSunlight, a corresponding increase in scatter for this class would be expected.

Furthermore, comparison of the classes SnowSunlight and SeaIce in this figure also show that these two classes are not entirely numerically separated, as some blue points are extending in to the main body of the yellor points on the cluster plot. The PDFs shown below further illustrate this.


Figure 14. Probability Distribution Functions with superimposed Gaussian fits for all classes at 475-nm.


Figure 15. Probability Distribution Functions with superimposed Gaussian fits for all classes at 625-nm.


Figure 16. Probability Distribution Functions with superimposed Gaussian fits for all classes at 775-nm.

Figures 14, 15, and 16 show the calculated PDFs at wavelengths 475-, 625-, and 775-nm, respectively. All three Figures indicate a significant amount of overlap in classes in the middle intensity regions. At the shortest wavelengths, the classes the most numerically separated are RedRoof, OrangeRoof, and SnowShadow. At longer wavelengths, OrangeRoof and RedRoof begin to overlap considerably, while the classes RocksShadow and SnowShadow remain clearly separated.

Some special consideration must be given to the classes SnowSunlight and SeaIce, seen in these figures in yellow and blue, respectively. Figure 14 clearly shows that the PDF for SeaIce is a bimodal distribution that bounds the SnowSunlight PDF on both sides, which is poorly estimated by fitting a single gaussian. This is a good example of the need to designate two subclasses, e.g. SeaIceI and SeaIceII and perform the classification separately for each one. The two subclasses can be later combined together. From a physical point of view, it is likely that the training area for the class SeaIce contained a fair amount of sunlit snow. At visible wavelengths, the reflectance of both ice and snow do differ, but not significantly enough to perform a thresholding operation. At infared wavelengths outside the spectral sensitivity of our detector, reflectances for ice and snow begin to exhibit differences large enough for reliable thresholding to be possible.


Figure 17. Jeffries-Matusita (J-M) distances between classes for all spectral samples.

Shown in Figure 17, the last output from the Image Calculator program is the Jeffries-Matusita (J-M) distance, which is a value used to represent the amount of separation of each class relative to the others. In this case, it is given by:

where m is the mean and &Sigma is the covariance matrix. The indices i and j range over all classes. The limits of the J-M distance are [0, 1.41], with 0 being unseparated (i.e. a class compared to itself) and 1.41 being totally separated. As discussed above, while some classes have a reasonable separation as given by their J-M distance, as in the case of RocksShadow and SnowShadow (cf. Figure 16) with a distance of 1.41, others yield a much lower J-M distance and hence numerical separation. The lowest distance calculated is for the classes SnowSunlight and SeaIce, with a value of 1.09. This further indicates the need to either divide SeaIce into two subclasses or redifine the training area to not include any sunlit snow, if possible.

Improvements to the classification process can be divided into two groups: data improvements and procedural improvements. The data analyzed here make classification difficult for a couple of reasons. First the weather was clear; and despite flight-planning efforts to minimize specular reflection of sunlight into the detector, it indeed occurred. This scene contains a significant amount of shadow which complicates the process and requires the creation of multiple subclasses (e.g Snow -> SnowShadow + SnowSunlight). Ideally a high, reasonably optically thick overcast, which would illuminate the scene with mostly diffuse light, would be preferred.

Regarding the classification procedure, there is much room for improvemnt. As mentioned above, performing thresholding and segmentation at different spectral samples would greatly aid the user in establishing boundaries for training areas. There is a possibility to do this automatically in software, and let the program choose the boundaries itself. This would allow for a dramatic increase in the number of classes allowed (the Image Calculator software has a limit of 15 classes, although all 15 were not used in this scene).


(Finished 10.05.05 -JMH)