Traditional Culture Encyclopedia - Traditional festivals - Overview of multi-sensor fusion for autonomous driving

Overview of multi-sensor fusion for autonomous driving

Autopilot is an advantageous industry in the field of high-tech industry development and intelligent travel, in which various technologies have developed rapidly and achieved many key achievements. Among them, the sensor positioning module plays a vital role in the automatic driving system. Self-driving cars must first know their position in the real world and many obstacles around the car body, including dynamic obstacles and static obstacles. Among them, dynamic obstacles include pedestrians, animals, vehicles and other non-motor vehicles. Static obstacles, including roadblocks, road signs, guardrails, etc. , can be marked in high-precision maps, and must depend on the frequency of map updates. The sensing part makes full use of various sensors to sense the surrounding environment, and sends the data back to the industrial computer in real time. The shape, speed, distance, category and other information of obstacles are obtained through the corresponding model and algorithm of the sensing module, so that the planning and forecasting module can predict the movement trajectory of obstacles and make corresponding driving decisions. After the driverless car obtains the road environment information through the on-board sensing system, it automatically plans the driving route, controls the speed and steering of the vehicle, and realizes the safe and reliable driving of the vehicle on the road. The key technologies of driverless cars mainly include road environment perception, driving path planning, intelligent decision-making of vehicle motion behavior and adaptive vehicle motion control. At present, the immature development of environmental awareness technology is still the main reason that hinders the overall performance improvement of driverless cars, and it is also the biggest obstacle to the mass production of driverless cars.

At present, the sensors used in the automatic driving perception module mainly include camera, millimeter wave radar, ultrasonic radar, laser radar and so on. The camera has the advantages of high resolution, high speed, rich information and low cost. Relying on the powerful learning ability of deep learning to complex data can greatly improve the classification ability of environmental awareness. Millimeter wave radar has the advantages of fast response, simple operation and ignoring occlusion, and can provide the effective position and speed of the target under various conditions. Lidar has the advantages of accurate three-dimensional perception, insensitivity to light changes and rich information. Image data can't provide accurate spatial information, millimeter wave radar has extremely low resolution, and lidar is very expensive. At the same time, with the improvement of the performance of each sensor, a single sensor brings more and more information, so it is very difficult to extract features without losing effective information. Therefore, how to efficiently process and fuse multi-sensor data is a very challenging task.

In recent years, deep learning has made amazing achievements in camera data, and the speed and accuracy of 2D target detection have been greatly improved, which proves that deep learning is an effective feature extraction method. With the development of convolutional neural network model, the speed and ability of automatic driving camera to extract data features are greatly improved. By effectively using these image features with high robustness, good quality and high detection accuracy, vision-based driverless cars can also obtain good detection results in 3D sensing tasks. Deep learning also has a good effect in processing lidar data. With the introduction of network based on sparse point cloud data, deep learning has gradually surpassed some traditional methods in learning point cloud characteristics. However, when using deep learning for multi-sensor fusion, there are still some problems, such as low fusion efficiency, unmatched data and easy over-fitting. In the process of applying multi-sensor fusion technology to obstacle detection in autonomous driving, there are also some problems such as insufficient detection accuracy, missed detection and false detection, and insufficient real-time processing ability. Due to the improvement of the level of self-driving cars, the traditional multi-sensor target fusion can no longer meet the needs of decision-making, and a large number of perceived redundant information also brings great difficulties to decision-making. Moreover, it is very difficult to fuse multi-sensor information effectively because of the huge difference in information dimension and information norm between the original data of multi-sensors.

Multisensor data fusion includes multisensor spatial fusion and time synchronization. Sensors are installed in different positions of the car body, and each sensor defines its own coordinate system. In order to obtain a consistent description of the measured object, it is necessary to transform different coordinate systems into a unified coordinate system. The coordinate systems involved in the spatial fusion model of point cloud data and image data include world coordinate system, lidar coordinate system, camera coordinate system, image coordinate system and pixel coordinate system. The main work of spatial fusion is to obtain the transformation matrix among radar coordinate system, camera coordinate system, image physical coordinate system and image pixel coordinate system. However, due to the different working frequencies of different sensors, data acquisition can not be synchronized, so it is necessary to carry out multi-sensor time fusion according to the relationship between working frequencies. Usually, the data of each sensor is unified into one sensor, and the scanning period is long.

The information fusion of automatic driving perception module is also called data fusion, which can also be called sensor information fusion or multi-sensor fusion. It is a process of correlation, association and synthesis of data and information obtained from single or multiple information sources to obtain accurate position and identity estimation, and it is also a process of self-correction in information processing to improve the results. The richer information about objects and environment obtained by multi-sensors is mainly reflected in the fusion algorithm. Therefore, the core problem of multi-sensor system is to choose the appropriate fusion algorithm.

Multisensor information fusion can be simply divided into detection level, position level (target tracking level) and attribute level (target recognition level). For structural models, different levels of information fusion have different structural models. The structural model of detection layer includes parallel structure, decentralized structure, serial structure and tree structure. There are centralized, distributed, hybrid and multilevel structural models at the position level, and state estimation is mainly carried out through the cooperation of multiple sensors. There are three types of attribute-level structure models: corresponding decision-making layer, feature layer and data layer attribute fusion.

Detection level fusion is the fusion of detection decision or signal level in multi-sensor distributed detection system, which integrates the detection results of different sensors to form a more accurate judgment of the same target and obtain the detection effect that any single sensor can not achieve. It is an important research content in information fusion theory. Position-level fusion is the fusion directly based on the observation report or measurement trajectory of the sensor or the state estimation of the sensor, including the fusion in time and space. It is the fusion of tracking levels, which belongs to the intermediate level and is also the most important fusion. Multi-sensor detection fusion system is mainly divided into centralized and distributed modes. Centralized fusion is to transmit the original data of each sensor directly to the fusion center, and then the fusion center processes these data and finally generates a decision. Distributed fusion is to preprocess each sensor data to get independent detection results, and then transmit all sensor detection results to the fusion center for hypothesis testing to generate the final judgment. According to the different attribute levels of multi-sensor fusion, the fusion of attribute levels is mainly divided into three categories, namely, data level, feature level and target (decision-making) level. These methods mainly include methods based on estimation, classification, reasoning and artificial intelligence.

Data layer fusion is to fuse the original data collected by sensors. Before fusion, the original data will not be preprocessed, but the data will be fused first and the fused data will be processed. Then feature extraction and decision-making are the lowest integration. Each sensor only transmits its original data to the fusion module, and then the fusion module processes the original data from all sensors. Then the original data after fusion is taken as input and the corresponding algorithm is given. In the traditional method, Pietzsch and others use low-level measurement vector fusion to combine data from different sensors for pre-collision applications. With the development of deep learning, in the case of data registration, deep learning can also be used to learn registered data. This fusion method requires the fused sensor information to have high registration accuracy. The advantage of this fusion method is that it can provide detailed information that the other two fusion levels can't provide. By fusing raw data from different sources, data can be classified at an early stage. However, the amount of sensor data to be processed is large, the processing time is long, the data bandwidth is high, the real-time performance is poor, and the anti-interference ability is poor, which may be complicated in practical application. In addition, because the original data adopt different formats and different sensor types, data fusion requires the same sensor types. Therefore, adding new sensors to the architecture requires major changes to the fusion module.

Therefore, some researchers began to introduce the idea of feature layer fusion. Different from directly using the original data of different sensors for fusion, feature-level fusion first extracts features from their respective data, and then fuses the extracted features. Feature-level fusion needs to extract features from the data collected by sensors, extract feature vectors, and then process the feature information, and finally get the fused features for decision-making, which belongs to intermediate fusion. Its advantage is that it is helpful to improve real-time performance and reduce the requirement for communication broadband. Feature level fusion provides more target feature information and increases the dimension of feature space. The reason why the fusion performance is reduced is that some useful information is lost. The realization technologies of feature-level fusion mainly include template matching method, clustering algorithm, neural network, support vector machine and SVM. Most methods based on deep learning also use neural network to extract features, and cascade or weight the features proposed by different sensors, such as RoarNet, AVOD, MV3D, F-PointNet, etc. The main advantage of feature-level fusion is that it can reduce the bandwidth of sensor data to the fusion module and improve the effect by enhancing the complementarity of features. Feature-level fusion retains the same classification and preprocessing ability as low-level fusion, allowing related data to be similar and effectively integrated into tracking algorithms.

Target (decision-making) layer fusion architecture is the opposite of bottom layer fusion. Each sensor executes a target detection algorithm and generates a tracking target list. Then, the fusion model fuses these targets with target tracking sequences. Each sensor is preprocessed, feature extracted, identified or judged, and the final preliminary decision is judged by fusion, so it is the highest level of fusion. Decision-level fusion can be performed in homogeneous or heterogeneous sensors. The advantages and disadvantages of decision-level fusion are just the opposite to those of data level, and the main advantages of target-level fusion are its modularity and encapsulation of specific details of sensors. Moreover, the communication volume is small, and it has certain anti-interference ability and low processing cost. Choosing the appropriate fusion algorithm can minimize the impact. Main disadvantages: the cost of preprocessing is high, and the effect of data information processing depends on the performance of preprocessing stage. Commonly used methods include: expert system method, fuzzy set theory, Bayesian reasoning, D-S evidence theory and so on. At present, most target detection methods based on target layer fusion are very inefficient and are not suitable for the detection time requirements of autonomous vehicles. At the same time, the integration of element layer and data layer also needs to consider their respective data forms more.

At present, most research on multi-sensor fusion focuses on image data and multi-line lidar. However, the camera-based automatic driving perception system just lacks the dimension of spatial information and cannot accurately recover the position of spatial information. The camera is easily affected by factors such as light and detection distance. When detecting long-distance targets, only very low-resolution information can be given, which even human eyes can't distinguish, leading to the problem of being unable to mark or mismarking. It can't stably cope with the vehicle detection task in complex and changeable traffic environment, and can't meet the stability requirements of driverless cars. Therefore, automatic driving target detection needs more sensors. Lidar has the advantages of long detection distance, being unaffected by light, and being able to accurately obtain target distance information, which can make up for the shortcomings of the camera. When recognizing the target, it can be judged whether there is a point cloud in the detected frame at this time, so as to decide whether to correct the corresponding recognition confidence. The fusion of radar point cloud data and image data can not only obtain the accurate depth information of the target, but also reduce the probability of missed detection in image detection, so as to achieve the purpose of improving the detection effect by fusing data. Through this multi-view coding scheme, a more effective and compact representation of sparse 3D point clouds can be obtained.

Because of the easy acquisition of visual images and various processing methods, visual technology is the main means to obtain information in the research of autonomous vehicles at this stage. Among them, vision technology is mainly divided into monocular vision and binocular vision. Monocular vision recognition technology mostly adopts the method based on vehicle features, mainly using features different from the background, such as texture, edge, background and so on. However, the information obtained by this method is insufficient, lacks depth information, and is easily disturbed by external environment, such as illumination and shadows. Although binocular vision recognition technology can achieve good results, it is difficult to ensure real-time vehicle recognition because of its large amount of calculation and complex algorithm. Lidar can obtain the distance information of the scene, which is not easily affected by external conditions such as illumination, but it is easy to cause misjudgment because of insufficient external information. Because the image has good lateral texture characteristics, point cloud can provide reliable vertical spatial characteristics, and multi-sensor fusion technology can overcome the shortcomings of insufficient information obtained by a single sensor and small detection range. With the development of autonomous driving and deep learning technology, multi-sensor fusion related technologies have been greatly promoted. Multi-sensor fusion technology can basically be summarized as follows: for multi-sensor information with different spatial and temporal dimensions, according to the fusion criteria, this information is analyzed to get a consistent description and explanation of the measured target, and then the subsequent decision-making and estimation are realized, so that the fusion results are more abundant and accurate than those obtained separately. In the field of autonomous driving, traditional multi-sensor fusion algorithms such as Kalman filter algorithm and D-S evidence theory still play a very important role. However, with the rapid development of deep learning, end-to-end data fusion has become an indispensable method for autonomous driving.

Some existing fusion schemes are only used to help confirm the existence of the target, such as the corresponding visual detection when the laser radar returns to the approximate area with the target; Some use a unified framework for fusion, for example, under the framework based on Kalman filter, different sensors are given different covariances, and any sensor updates the target data in turn after obtaining the target data. These schemes can realize multi-sensor data fusion, but because different sensors are treated equally and fused, the method is direct but inefficient, so there is a lot of room for improvement. In 3D target estimation based on pure vision, the estimated distance attribute is extremely unstable. Through multi-sensor fusion, the visual information is corrected, which greatly improves the target detection accuracy of autonomous driving. The fusion of camera and lidar information in the target layer can not meet the requirements of automatic driving.

Reference:

Liao Yuepeng (automatic driving target detection based on multi-sensors)? /view/93e 56 1 a2 FBD 6 195 f 3 12b 3 169a 45 177232 f 60e 480 . html CVPR _ 20 17:? Multi-perspective? 3d? Object? Testing? Network? For what? Autopilot;