Traditional Culture Encyclopedia - Traditional stories - Comparison of various remote sensing data classification methods

Comparison of various remote sensing data classification methods

There are many commonly used thematic classification methods of remote sensing data, which can be divided into statistical classifier, neural network classifier and expert system classifier from the perspective of classification decision-making methods. In terms of whether training data is needed, it can be divided into supervised classifier and unsupervised classifier.

I. Statistical classification methods

Statistical classification methods are divided into unsupervised classification methods and supervised classification methods. Unsupervised classification method does not need to select pixels of known classes to train the classifier, while supervised classification method needs to select a certain number of pixels of known classes to train the classifier to estimate the parameters in the classifier. Unsupervised classification method does not need any prior knowledge, and will not introduce ideological errors because of the selection of training samples, but the natural categories obtained by unsupervised classification often do not match the categories of interest in the study. Accordingly, supervised classification generally needs to define classification categories in advance, and the selection of training data may lack representativeness, but serious classification errors may also be found in the training process.

1. Unsupervised classifier

Unsupervised classification method is generally clustering algorithm. The most commonly used unsupervised clustering methods are K- means clustering method (Duda and Hart, 1973) and iterative self-organizing data analysis algorithm (ISODATA). Its algorithm description can be found in the literature of general statistical pattern recognition.

In general, the classification results obtained by simple clustering methods are of low accuracy, so it is rare to use clustering methods alone to classify remote sensing data. Through the cluster analysis of remote sensing data, we can initially understand the distribution of each category and obtain the prior probability of each category in the maximum likelihood supervised classification. The mean vector and covariance matrix of the final category can be used in the maximum likelihood classification process (Schowengerdt, 1997).

2. Supervised classifier

Supervised classifier is the most commonly used classifier in thematic classification of remote sensing data. Compared with the unsupervised classifier, the supervised classifier needs to select a certain amount of training data to train the classifier, estimate the key parameters in the classifier, and then divide the pixels into various categories with the trained classifier. The supervised classification process generally includes four steps (Richards, 1997): defining classification categories, selecting training data, training classifiers and finally classifying pixels. Each step has a great influence on the uncertainty of the final classification.

Supervised classifiers are divided into parametric classifiers and nonparametric classifiers. Parameter classifier requires the data to be classified to meet a certain probability distribution, while non-parameter classifier does not require the probability distribution of data.

Commonly used classifiers in remote sensing data classification include maximum likelihood classifier, minimum distance classifier, Mahalanobis distance classifier, K- nearest neighbor classifier (K-NN) and parallelepiped classifier. The third chapter introduces the maximum likelihood, minimum distance and Mahalanobis distance classifiers in detail. K-NN classifier and parallelepiped classifier are briefly introduced here.

K-NN classifier is a nonparametric classifier. The decision rule of this classifier is to divide the pixels into the categories represented by the feature vectors of the training data which are closest to their feature vectors in the feature space (Schowengerdt, 1997). When K= 1 in the classifier, it is called 1-NN classifier. At this time, the category of the training data closest to the pixel to be classified is taken as the category of the pixel. When k > 1, the category with the largest number of pixels in the k latest training data of the pixel to be classified is taken as the category of the pixel, or the reciprocal of the Euclidean distance between the pixel to be classified and its k neighboring pixel feature vectors can be calculated as the weight, and the category of the training data with the largest weight value is taken as the category of the pixel to be classified. Hardin, (1994) discussed K-NN classifier in depth.

Parallelogram classification method is a simple nonparametric classification algorithm. This method determines the brightness range of various types of pixels by calculating the upper and lower bounds of the histogram of each band of training data. For each category, the upper and lower limits of each band together form a multidimensional box or parallelepiped. So there are m parallelepiped in m categories. When the luminance value of a pixel to be classified falls within a specific category of a parallelepiped, the pixel is classified into the category represented by the parallelepiped. The parallelepiped classifier can be expressed by the classification problem of two-band remote sensing data in Figure 5- 1. The ellipse in the figure represents the brightness value distribution of each category estimated from the training data, and the rectangle represents the brightness value range of each category. Pixels are classified into which category if their brightness falls within the brightness range.

Figure 5- 1 Schematic Diagram of Parallelogram Classification

3. Evaluation of statistical classifier

The performance of various statistical classifiers in remote sensing data classification is different, which is not only related to the classification algorithm, but also related to the statistical distribution characteristics of data, the selection of training samples and other factors.

Unsupervised clustering algorithm does not need the statistical characteristics of classified data, but because unsupervised classification methods do not consider any prior knowledge, the classification accuracy is generally low. In more cases, cluster analysis is regarded as exploratory analysis before unsupervised classification, which is used to understand the distribution and statistical characteristics of each category in classified data and provide prior knowledge for category definition, training data selection and final classification process in supervised classification. In practical application, supervised classification method is generally used to classify remote sensing data.

Maximum likelihood classification is the most commonly used classification method in remote sensing data classification. Maximum likelihood classification belongs to parameter classification. Maximum Likelihood Classification is considered as the classification method with the highest classification accuracy when there are enough training samples, the prior probability distribution of classes has some knowledge and the data is close to normal distribution. However, when there are few training data, the deviation of mean and covariance parameter estimation will seriously affect the classification accuracy. Swain and Davis( 1978) think that in the maximum likelihood classification of n-dimensional spectral space, the training data samples of each category should reach at least 10×N, and it is better to reach 100×N if possible. Moreover, in many cases, the statistical distribution of remote sensing data does not meet the assumption of normal distribution, so it is difficult to determine the prior probability of each category.

The minimum distance classifier can be considered as a maximum likelihood classification method without considering covariance matrix. When there are few training samples, the estimation accuracy of mean is generally higher than that of covariance matrix. Therefore, in the case of limited training samples, it is not necessary to calculate covariance matrix, but only to estimate the mean value of training samples. Therefore, the maximum likelihood algorithm degenerates into the minimum distance algorithm. Because the covariance of data is not considered, the probability distribution of categories is symmetrical, and the variance of spectral feature distribution of each category is considered to be equal. Obviously, when there are enough training samples to ensure the accurate estimation of covariance matrix, the accuracy of maximum likelihood classification results is higher than the minimum distance. However, when there are few training data, the minimum distance classification accuracy may be higher than the maximum likelihood classification accuracy (Richards, 1993). Moreover, the minimum distance algorithm does not require the probability distribution characteristics of data.

Mahalanobis distance classifier can be regarded as a maximum likelihood classifier when the covariance matrices of all classes are equal. Because the covariance matrices of all classes are assumed to be equal, compared with the maximum likelihood method, it loses the information of the covariance matrix difference between classes, but compared with the minimum distance method, it maintains a certain directional sensitivity through the covariance matrix (Richards, 1993). Therefore, Mahalanobis distance classifier can be regarded as a classifier between maximum likelihood classifier and minimum distance classifier. Like maximum likelihood classification, Mahalanobis distance classifier requires data to obey normal distribution.

One of the main problems of K-NN classifier is that it needs a large training data set to ensure the convergence of the classification algorithm (Devil and Kittler, 1982). Another problem of K-NN classifier is that the error of training sample selection has a great influence on the classification results (Cortijo and Blanca, 1997). At the same time, the computational complexity of K-NN classifier increases with the expansion of the nearest neighbor range. However, because K-NN classifier considers the spatial relationship in the pixel neighborhood, compared with other spectral classifiers, there are fewer "salt and pepper phenomena" in the classification results.

The advantages of parallelepiped classification are simple, fast and independent of any probability distribution requirements. Its disadvantages are: firstly, pixels falling outside the range of brightness values of all categories can only be classified as unknown categories; Secondly, it is difficult to distinguish the types of pixels falling in the overlapping areas of various brightness ranges (as shown in Figure 5- 1).

The characteristics of various statistical classification methods can be summarized as Table 5- 1.

Second, the neural network classifier

The biggest advantage of neural network in remote sensing data classification is that it can treat multi-source input data equally. Even though these input data have completely different statistical distributions, the weights of the connections between a large number of neurons in each layer of neural network are opaque, so it is difficult for users to control them (Austin, Harding and Kanellopoulos et al., 1997).

Neural network remote sensing data classification is considered as one of the hot research fields of remote sensing data classification (Wilkinson,1996; Kimes, 1998). Neural network classifiers can also be divided into supervised classifiers and unsupervised classifiers. Because the neural network classifier does not require the statistical distribution of classified data, it belongs to nonparametric classifier.

Multilayer percep-tron (MLP) is the most commonly used neural network in remote sensing data classification. The network structure of this model is shown in Figure 5-2. The network includes three layers: input layer, hidden layer and output layer. The input layer is mainly used as the input interface of input data and neural network, and has no processing function itself; The processing power of the hidden layer and the output layer is contained in each node. The input structure is generally the feature vector of the data to be classified, which is generally the multispectral vector of the training pixel, and each node represents a spectral band. Of course, the input node can also be the spatial context information of the pixel (such as texture) or the multi-period spectral vector (Paola and Schowengerdt, 1995).

Table 5- 1 Comparison of various statistical classifiers

Figure 5-2 Neural Network Structure of Multilayer Perceptron

For the node between the hidden layer and the output layer, its processing process is an activation function. Assuming that the excitation function is f(S), for hidden layer nodes, there are:

Study on Uncertainty of Remote Sensing Information

Where pi is the input of the hidden layer node; Hj is the output of hidden layer node; W is the weight between the nerves connecting each layer.

For the output layer, the following relationship exists:

Study on Uncertainty of Remote Sensing Information

Where hj is the input of the output layer; Ok is the output of the output layer.

The excitation function is generally expressed as:

Study on Uncertainty of Remote Sensing Information

After determining the network structure, it is necessary to train the network to make it have the ability to predict the output results according to the new input data. Back propagation training algorithm is the most commonly used. The algorithm inputs the training data into the network from the input layer, randomly generates the connection weight of each node, calculates according to the formula in (5- 1)(5-2)(5-3), compares the network output with the expected result (the category of training data), and calculates the error. The back propagation network uses this error to adjust the connection weight between nodes. The method to adjust the connection weight is generally delta rule (Rumelhart et al., 1986):

Study on Uncertainty of Remote Sensing Information

Where η is the learning rate; Δ k is the rate of error change; α is the momentum parameter.

In this way, the forward and backward propagation processes of data are iterated until the network error is reduced to a preset level, and the network training is over. At this time, the data to be classified can be input into the neural network for classification.

In addition to the multilayer perceptron neural network model, other network models are also used for remote sensing data classification. For example, Kohonen self-organizing network is widely used in unsupervised clustering analysis of remote sensing data (Yoshida et al.,1994; Schaale et al.,1995); Adaptive resonance theory network (Silva, s and Caetano, M. 1997), fuzzy ART mapping (Fischer, M.M and Gopal, s. 1997), radial basis function (Luo, 1997).

There are many factors that affect the classification accuracy of remote sensing data based on neural network. Foody and Arora( 1997) think that the structure of neural network, the dimension of remote sensing data and the size of training data are important factors that affect the classification of neural network.

The structure of neural network, especially the number of layers and the number of neurons in each layer, is the most critical problem in the design of neural network. The network structure not only affects the classification accuracy, but also directly affects the network training time (Kavzoglu and Mather, 1999). For the neural network used for remote sensing data classification, the number of neurons in the input layer and the output layer is determined by the feature dimension and the total number of categories of remote sensing data, so the design of network structure mainly solves the number of hidden layers and the number of hidden neurons. Generally, too complex network structure is good at describing training data, but the classification accuracy is low, that is, the phenomenon of "over-fitting". However, the simple network structure can not learn the patterns in the training data well, so the classification accuracy is low.

The network structure is generally determined by experiments. Hirose et al. (199 1) proposed a method. This method starts with a small network structure, adds a hidden neuron every time the network training falls into local optimum, then trains again, and so on until the network training converges. This method may cause the network structure to be too complicated. One solution is to subtract the last neuron when the network is considered convergent until the network no longer converges, and then the last converged network is considered as the optimal structure. The disadvantage of this method is that it is very time-consuming. Pruning is another method to determine the structure of neural network. Different from the method of Hirose et al. (199 1), the "pruning method" starts with a large network structure, and then gradually removes the neurons considered redundant (Sietsma and Dow, 1988). The advantage of starting from a large network is that it is fast in learning and insensitive to initial conditions and learning parameters. The process of "pruning" is repeated until the network no longer converges, and the finally converged network is considered to be optimal (Castellano, Fanelli and Pelillo, 1997).

The number of training data samples used for neural network training varies with different network structures, categories and other factors. But the basic requirement is that the training data can fully describe the representative categories. Foody et al. (1995) think that the size of training data has a significant impact on the classification accuracy of remote sensing, but compared with statistical classifiers, the training data of neural networks can be less.

The influence of data dimension of classification variables on classification accuracy is a common problem in remote sensing data classification. Many studies show that the separability between general categories and the final classification accuracy will increase with the increase of data dimension, and after reaching a certain point, the classification accuracy will decrease with the increase of data dimension (shahani and Landgrebe, 1994). This is the famous Hughes phenomenon. Generally, it is necessary to remove the bands with high information correlation through feature selection or remove redundant information through principal component analysis. The dimension of classification data also has obvious influence on the accuracy of neural network classification (Battiti, 1994), but Hughes phenomenon is not as serious as traditional statistical classifiers (Foody and Arora, 1997).

Kanellopoulos( 1997) thinks that an effective ANN model should consider the following points through long-term practice: appropriate neural network structure, optimized learning algorithm, preprocessing of input data, avoiding oscillation and adopting mixed classification method. Among them, the hybrid model includes the hybrid of various artificial neural network models, the hybrid of artificial neural network and traditional classifier, and the hybrid of artificial neural network and knowledge processor.

Third, other quantifiers

In addition to the above statistical classifiers and neural network classifiers, there are many classifiers used for remote sensing image classification. For example, fuzzy classifier, which is a classifier for the situation that the ground category changes continuously and there is no obvious boundary. It determines the fuzzy membership degree of each type of pixel through fuzzy reasoning mechanism. General fuzzy classifiers include fuzzy C-means clustering, supervised fuzzy classification (Wang, 1990) and mixed pixel model (Foody and Cox,1994; Settle and Drake, 1993) and various artificial neural network methods (Kanellopoulos et al.,1992; Paola and Schowengerdt, 1995). Because the result of fuzzy classification is the fuzzy membership degree of each category, it is also called "soft classifier", while the traditional classification method is called "hard classifier".

The other is the context classifier, which comprehensively considers the spectral and spatial characteristics of images. General spectral classifiers only consider the spectral characteristics of pixels. However, in remote sensing images, adjacent pixels generally have spatial autocorrelation. Pixels with strong spatial autocorrelation are usually more likely to belong to the same category. Considering the spectral characteristics and spatial characteristics of pixels at the same time can improve the accuracy of image classification and reduce the "salt and pepper phenomenon" in the classification results. This phenomenon will be more obvious when the spectral spaces between categories overlap (Cortijo et al., 1995). This "salt and pepper phenomenon" can be eliminated by post-processing filtering of classification, and can also be solved by adding information indicating the neighborhood relationship of pixels in the classification process.

Context information can be added in different ways during the classification process. One is to add image texture information to classification features; The other is image segmentation technology, including Ketting and Landgrebe (1976), edge detection method and Markov random field method. Rignot and Chellappa( 1992) used Markov random field method to classify SAR images, and achieved good results. Paul Smiths (1997) proposed a Markov random field method to preserve edge details and applied it to SAR image classification. Crawford( 1998) combines hierarchical classification method and Markov random field method to classify SAR images, and obtains higher accuracy. Cortijo( 1997) uses nonparametric spectral classification to classify remote sensing images, and then uses ICM algorithm to modify the initial classification.

Previous article:What kind of tea does Jin Jun Mei belong to?
Next article:Appreciation of four-word idioms couplets calligraphy works