Traditional Culture Encyclopedia - Traditional festivals - 3d target detection research

3d target detection research

Doing 3d target detection at Dharma Academy, a brief research.

Detection that uses RGB images, RGB-D depth images, and laser point clouds to output object categories and information such as length, width, height, and rotation angle in three-dimensional space is called 3D target detection.

In the application scenarios of drones, robots, and augmented reality, ordinary 2D detection does not provide all the information needed to perceive the environment, and 2D detection can only provide the position of the target object in the two-dimensional picture and the corresponding category confidence, but in the real three-dimensional world, the objects have three-dimensional shapes, and most of the applications need to have information on the length, width, height, and deflection angle of the target object. information. For example, in Fig. 1 below, in an autonomous driving scenario, the target object's 3D size and rotation angle need to be provided from the image, and the information in the bird's-eye view projection is crucial for subsequent path planning and control in the autonomous driving scenario.

3DOP This paper is a method for 3D bounding-box effect using binocular camera, which is an extension of Fast RCNN method in 3D field. Since the original paper was published in NIPS15, it takes 4.0s to process an image, as Fast RCNN is not as effective as Faster RCNN and regression-based methods, and is far from real-time.

It uses a stereo image pair as input to estimate the depth, and computes the point cloud by re-projecting the pixel-level coordinates in the image plane back into the 3D space. 3D space to compute the point cloud.3DOP defines the problem of candidate zone generation as an energy minimization problem for Markov Random Fields (MRFs), which involves well-designed potential functions (e.g., target size prior, ground plane, and point cloud density, etc.).

With a different set of candidate frames for the 3D target obtained, 3DOP utilizes the FastR-CNN [11] scheme to regress the target location.

The paper is mainly based on the improvement done by FCOS anchorless 2D target detection, backbone is ResNet101 with DCN and equipped with FPN architecture for detecting targets at different scales, the network structure is shown in Fig. 1:

Based on the iou 3d, the TP and FP can be defined

By plotting Precision × Recall curve (PRC), the area under the curve often indicates the performance of a detector. However, in real-world cases, the "zigzag" PRC poses a challenge in accurately calculating its area. kitti uses the AP@SN metric as an alternative, which directly circumvents the calculation method.

NuScenes consists of multi-modal data collected from 1000 scenes, including RGB images from 6 cameras, points from 5 Radars, and 1 LiDAR. It is split into 700/150/150 scenes for training/validation/testing. There are overall 1.4M annotated 3D bounding boxes from 10 categories. In addition, nuScenes uses different metrics, distance-based mAP and NDS, which can help evaluate our method from another perspective.