Traditional Culture Encyclopedia - Traditional culture - My understanding of image segmentation

My understanding of image segmentation

Image segmentation is something I did on 20 19 in my sophomore year. This article is used to summarize.

The image is divided into semantic pixel levels, and the segmented objects are further classified by examples.

One or more gray threshold values are calculated based on the gray characteristics of the image, and the gray value of each pixel in the image is compared with the threshold values. Finally, the pixels are classified into appropriate categories according to the comparison results.

Determine the criterion function for solving the optimal gray threshold. Threshold method is especially suitable for graphics in which the target and background occupy different gray ranges.

It is worth mentioning that this method can also be used for feature point detection.

Find the area directly. There are two basic forms: one is region growth, which starts from a single pixel and gradually merges to form the required segmentation region; The other is to proceed from the overall situation and gradually cut into the required segmentation areas.

The image segmentation algorithm based on edge detection attempts to solve the segmentation problem by detecting edges containing different regions. It can be said that it is one of the methods that people first thought of and studied the most. Generally, the gray values of pixels on the boundaries of different regions change dramatically. If Fourier transform is used to transform the image from spatial domain to frequency domain, the edge corresponds to the high frequency part, which is a very simple edge detection algorithm.

Conventional convolution

The conventional convolution+residual solution gradient disappears and the network becomes deeper.

Efficient neural network

ResNet-38

Full resolution residual network (FRRN)

Adapuni

Developed from target detection (R-CNN, Fast R-CNN)

Based on the fast R-CNN structure, the mask prediction branch is added, the region of interest pool is improved, and the region of interest alignment is proposed.

The evaluation function only scores the candidate boxes for target detection, and does not divide the templates.

Improvement of (1)ReSeg model FCN

Disadvantages of FCN: local or global context dependencies are not considered, which is very useful in semantic segmentation. Therefore, in ReSeg, the author uses RNN to retrieve context information as part of the segmentation foundation.

Convolutional neural network will lose some details when sampling, so as to get more eigenvalues. However, this process is irreversible, which sometimes leads to problems such as low image resolution and loss of details in later operation. Therefore, we can't complete the missing information to some extent by up-sampling, so as to get a more accurate segmentation boundary.

After convolution, upsampling is performed to obtain a fragment map.

Advantages:

FCN classifies images at pixel level, thus solving the problem of image segmentation at semantic level.

FCN can accept input images of any size, and can retain the spatial information in the original input images.

Disadvantages:

Because of up-sampling, the obtained results are fuzzy and smooth, and are insensitive to the details in the image;

Classifying each pixel separately does not fully consider the relationship between pixels and lacks spatial consistency.

Recover the reduced resolution in the deep convolutional neural network, so as to obtain more context information.

DeepLab is a method combining deep convolutional neural network with probability graph model, which is applied to the task of semantic segmentation, with the purpose of pixel-by-pixel classification. Its advancement is embodied in the combination of DenseCRFs (probability graph model) and DCNN. Each pixel is regarded as a CRF node, and the loss function of DCNN is directly optimized by using remote dependence and CRF reasoning.

In the field of image segmentation, a well-known operation of FCN is smoothing first and then filling, that is, rolling up lines first and then pooling, so as to reduce the image size and increase the receptive field at the same time. However, some information will be lost in the process of reducing the image size (convolution) and then increasing the size (upsampling), so there is room for improvement.

DeepLab proposed hole convolution to solve this problem.

Traditional image segmentation

Cross entropy loss

Coke loss solves the imbalance between difficult and difficult samples.

(2) medical image segmentation

Dice loss (this loss function has a background and directly optimizes the performance index, which involves my other topic, non-convex optimization)

IOU (often used as an evaluation index)

On the basis of the above basic losses, there are various improvements.

Because the adjacent pixels are too similar to the image information in the corresponding receptive field, this "similarity" is beneficial if the adjacent pixels belong to the interior of the required segmentation area, but it is harmful if the adjacent pixels are just on the boundary of the required segmentation area.

Context features are very common. In fact, the context is probably understood as that every pixel in an image cannot exist in isolation, and a pixel must have a certain relationship with its surrounding pixels. A large number of pixels are connected with each other to produce various objects in the image, so context features refer to some relationship between pixels and surrounding pixels.

1. Add extra loss to the segmented boundary output by the network, or let the network model and learn the characteristics of the boundary and the characteristics inside the region respectively. Its essential idea is to let the network do two tasks at the same time: segmentation and edge detection. In addition, it is simple and effective to improve the input resolution of the input image and the resolution of the middle layer feature map.

2. Using the loss of image two-dimensional space or the dynamic weighting of sampling, we can solve the problems of uneven number of pixels with different semantics and different learning difficulties in the same image.

3. Use semi-supervised or weakly supervised learning to reduce the expensive labeling problem. The features of multiple noisy samples or labels are used to construct virtual clean virtual samples or labels to reduce the noise of labels.

4. Use a reasonable context modeling mechanism to help the network guess the semantic information of the occlusion part.

5. Establish a loss or feature interaction module between different images in the network.

Previous article:Erhai's composition is 600 words.
Next article:Species of eagle