Traditional Culture Encyclopedia - Traditional culture - Lane Line Detection Algorithm - Ultra-Fast-Lane-Detection

Lane Line Detection Algorithm - Ultra-Fast-Lane-Detection

Lane line detection algorithms are usually classified into two types: one is based on semantic segmentation or instance segmentation based on visual features, such as LaneNet and SCNN, and the other is a model that solves the no-visual-clue problem by predicting the points where the lane lines are located by using visual features, such as Ultra-Fast-Lane-Detection, which is mentioned in this paper. Lane-Detection.

offical github : /cfzd/Ultra-Fast-Lane-Detection

paper : Ultra Fast Structure-aware Deep Lane Detection

The following figure shows the structure of the whole model, which can be basically divided into three parts: the Backbone, the Auxiliary part, and the Group Classification part used for the selection of lane line candidates. As you can see, since the part of the pipeline that participates in the final inference is only downsampled, unlike the segmentation model that has multiple rounds of upsampling, the overall computational effort of the model is quite low, and according to the results given in the paper, it can be as high as 300FPS.

The Backbone part uses a smaller ResNet18 or ResNet34, and downsamples to 4X as the final feature, which is actually a shallow feature, as segmentation models usually downsample to 16x or 32x. The paper also mentioned that using a larger sense field can achieve good detection results, which can greatly improve the inference speed of the model.

The Auxiliary section concatenates and upsamples three layers of shallow features for instance segmentation. Its purpose is to enhance the visual features during the training process, without participating in the inference.

The Group Classification section is shown below, and the paper calls it a row-based selecting method based on global image features, i.e., row-indexing on global features to compute candidate points, which incorporates a priori assumptions into the task of lane line detection.

In this way, the a priori hypothesis is incorporated into the task of lane line detection.

For the segmentation task, the size of the final feature map is HxWxC. The categorization is to be along the C direction, where the vector in the C direction represents the class to which the feature vector at a pixel location belongs; in this method, the size of the final feature map is hx(w+1)xC. h is to be sampled in the vertical direction, and the vector in the C direction represents the class to which the feature vector at a pixel location belongs. h is the number of rows to be sampled in the vertical direction (row anchor), h<H; w is the number of lane line candidate locations in the row direction (grid cell), w<W . C is the number of lane lines. The classification is along the w-direction, i.e., for each lane line, the probability of it appearing in each grid cell at a horizontal position is calculated in its predefined vertical direction h. The classification is done in the w-direction.

The Loss function used in the article is divided into three parts, namely, the multiclassification loss L_cls , the segmentation loss L_seg, and the lane structuring loss L_str . Among them, L_cls and L_seg are the two losses commonly used in common classification and segmentation tasks.

The purpose of the structuring loss is to constrain the shape of the predicted lane lines using a priori knowledge of the lane structure. Where L_sim is the similarity loss and L_shp is the shape loss.

The starting point for the similarity loss is that the distance between two neighboring points in the same lane should be as close as possible, and the L1 paradigm is used to constrain the distance.

The starting point for shape loss is based on the fact that most lane lines are straight, and even curves are mostly approximately straight. For the same lane line, the location of candidate points for lane lines on neighboring row achor should be chosen to be as close as possible. Ideally it should have a value of 0.

The Loc function is the expectation of the lane point in the jth row anchor of the i-th lane. Prob represents the probability that the kth position in the jth row anchor of lane i is a lane point. Since the background is not counted, the value of k starts from 1.

The paper gives the metric results as shown below, and its evaluation hardware should be an NVIDIA GTX 1080TI. This method greatly improves the inference speed while keeping the accuracy close, and is well suited for real-time detection tasks.

To test its real-world inference performance, I ran it in the NVIDIA RTX 3070+CUDA11+Pytorch1.7 environment. The inference performance of Ultra-Fast-Lane-Detection with a backbone of resnet18 and an input size of (288, 800, 3) is shown below, with a single batch inference speed of about 350FPS, which is basically the same as the results given in the paper.