Traditional Culture Encyclopedia - Traditional festivals - Typical algorithm of visual tracking

Typical algorithm of visual tracking

(1) region-based tracking algorithm

The basic idea of region-based tracking algorithm is: take the image block in the initial region of the target as the target template, and match the target template with all possible positions in the candidate image, and the position with the highest matching degree is the target position. The most commonly used correlation matching criterion is sum of squares (SSD).

Firstly, the target template used in region-based tracking algorithm is fixed, such as Lucas-Kanade method proposed by Lucas et al., which uses the spatial gradient information of gray image to find the best matching region and determine the target position. After that, more scholars have made different improvements in view of the shortcomings of the region-based method. For example, the adaptive target appearance model based on texture features proposed by Jepson et al. [18] can better solve the problem of target occlusion, and use online em algorithm to update the target model in the tracking process; Comaniciu et al. [19] proposed a video target tracking algorithm based on kernel function probability density estimation. This method uses kernel histogram to represent the target, calculates the similarity between the target template and the candidate region through Bhattacharya coefficient, and quickly locates the target position through MeanShift algorithm.

The region-based target tracking algorithm uses the global information of the target, such as gray information and texture features, so it has high reliability. Even if the target is slightly deformed, it will not affect the tracking effect. However, when the target is seriously blocked, it is easy to cause tracking failure.

(2) Feature-based tracking method

Feature-based target tracking algorithm usually uses some salient features of the target to represent the target and tracks the target in the image sequence through feature matching. This algorithm does not consider the overall characteristics of the target, so when the target is partially occluded, it can still use another part of the visible characteristics to complete the tracking task, but this algorithm can not effectively deal with the problems of complete occlusion and overlap.

Feature-based tracking methods usually include two processes: feature extraction and feature matching:

A) feature extraction

Feature extraction refers to extracting suitable descriptive features from the image area where the target is located. These features should not only distinguish the target from the background, but also be robust to the scaling of the target, the change of the shape of the target and the occlusion of the target. Commonly used target features include color features, gray features, texture features, contours, optical flow features, corner features and so on. D.G. Lowe's sift (Scale Invariant Feature Transform) algorithm [20] is an effective method in image features, which is invariant to rotation, scale scaling and brightness change, and also has certain stability to visual angle change, affine transformation and noise.

B) feature matching

Feature matching is to calculate and measure the similarity between the candidate area and the target area in a certain way, and determine the target position according to the similarity to realize target tracking. In the field of computer vision, the commonly used similarity measures are weighted distance, Bhattacharyya coefficient, Euclidean distance, Hausdorff distance and so on. Among them, Bhattacharyya coefficient and Euclidean distance are the most commonly used.

Tissainayagam and others proposed a target tracking algorithm based on point features [2 1]. Firstly, the algorithm finds the corner points with the largest local curvature in multiple scale spaces as key points, and then uses the proposed MHT-IMM algorithm to track these key points. This tracking algorithm is suitable for the target with simple geometry, but the tracking effect is poor for the complex target which is difficult to extract stable corners.

The target tracking algorithm based on edge features proposed by Zhu et al. [22] first divides the reference image into several sub-regions, takes the average value of edge points of each sub-region as the feature points of the target, and then uses the method similar to optical flow to match the feature points, thus realizing target tracking.

(3) Tracking method based on contour.

The contour-based target tracking method needs to specify the position of the target contour in the first frame of video, and then recursively solve it through differential equations until the contour converges to the local minimum of the energy function, which is usually related to image characteristics and contour smoothness. Compared with the region-based tracking method, the contour-based tracking method has lower computational complexity and is robust to partial occlusion of the target. However, this method needs to initialize the target contour at the beginning of tracking, so it is sensitive to the initial position and the tracking accuracy is limited to the contour level.

The active contour model (Snake) proposed by Kass et al. [23] in 1987 controls the contour movement through the interaction of image force, internal force and external constraint force. The internal force mainly restricts the local smoothness of the contour, and the image force pushes the curve to the edge of the image. The external force can be specified by the user, mainly to move the contour to the required local minimum.

Paragios et al. [24] proposed a target detection and tracking algorithm using level set method to represent the target contour. In this method, the edge of the target is obtained by frame difference method, and then the moving edge of the target is obtained by probability edge detection operator. The target tracking is realized by evolving the target contour into the moving edge of the target.

(4) Model-based tracking method [25]

In practical application, we often need to track some specific targets that we know in advance. Therefore, the model-based tracking method first establishes the 3D or 2D geometric model of the target off-line according to its prior knowledge, and then realizes the target tracking by matching the model of the selected area with the target model. In the process of tracking, according to the characteristics of images in the scene, the size parameters, attitude parameters and motion parameters of moving targets are determined.

Shu Wang and others put forward a super-pixel based tracking method [26], which establishes the appearance template of the target on the basis of super-pixel, and then determines the position of the target by calculating the confidence map of the target and the background. In this process, this method continuously prevents the template drift of the target through segmentation and color clustering.

(5) Tracking algorithm based on detection

Detection-based tracking algorithm is becoming more and more popular. Generally speaking, the detection-based tracking algorithm uses a little learning method to generate a detector for a specific target, that is, only the manually marked sample information in the first frame is used to train the detector. This algorithm simplifies the tracking problem to a simple classification problem of separating background and target, so this algorithm is fast and effective. In order to adapt to the change of target appearance, this kind of algorithm usually uses online learning to update itself, that is, the detector updates according to its own tracking results.