Traditional Culture Encyclopedia - Traditional festivals - 3d target detection research
3d target detection research
Detection that uses RGB images, RGB-D depth images, and laser point clouds to output object categories and information such as length, width, height, and rotation angle in three-dimensional space is called 3D target detection.
In the application scenarios of drones, robots, and augmented reality, ordinary 2D detection does not provide all the information needed to perceive the environment, and 2D detection can only provide the position of the target object in the two-dimensional picture and the corresponding category confidence, but in the real three-dimensional world, the objects have three-dimensional shapes, and most of the applications need to have information on the length, width, height, and deflection angle of the target object. information. For example, in Fig. 1 below, in an autonomous driving scenario, the target object's 3D size and rotation angle need to be provided from the image, and the information in the bird's-eye view projection is crucial for subsequent path planning and control in the autonomous driving scenario.
3DOP This paper is a method for 3D bounding-box effect using binocular camera, which is an extension of Fast RCNN method in 3D field. Since the original paper was published in NIPS15, it takes 4.0s to process an image, as Fast RCNN is not as effective as Faster RCNN and regression-based methods, and is far from real-time.
It uses a stereo image pair as input to estimate the depth, and computes the point cloud by re-projecting the pixel-level coordinates in the image plane back into the 3D space. 3D space to compute the point cloud.3DOP defines the problem of candidate zone generation as an energy minimization problem for Markov Random Fields (MRFs), which involves well-designed potential functions (e.g., target size prior, ground plane, and point cloud density, etc.).
With a different set of candidate frames for the 3D target obtained, 3DOP utilizes the FastR-CNN [11] scheme to regress the target location.
The paper is mainly based on the improvement done by FCOS anchorless 2D target detection, backbone is ResNet101 with DCN and equipped with FPN architecture for detecting targets at different scales, the network structure is shown in Fig. 1:
Based on the iou 3d, the TP and FP can be defined
By plotting Precision × Recall curve (PRC), the area under the curve often indicates the performance of a detector. However, in real-world cases, the "zigzag" PRC poses a challenge in accurately calculating its area. kitti uses the AP@SN metric as an alternative, which directly circumvents the calculation method.
NuScenes consists of multi-modal data collected from 1000 scenes, including RGB images from 6 cameras, points from 5 Radars, and 1 LiDAR. It is split into 700/150/150 scenes for training/validation/testing. There are overall 1.4M annotated 3D bounding boxes from 10 categories. In addition, nuScenes uses different metrics, distance-based mAP and NDS, which can help evaluate our method from another perspective.
- Previous article:Matters needing attention in Torch Festival
- Next article:The relationship between finance and economy
- Related articles
- What were the main divorce methods in ancient China? () A. Seven out of B. Li and C. Yi Jue D. filed for divorce.
- Elementary School's Understanding of China Traditional Culture
- What are China's ancient views on justice and benefit?
- My favorite delicious composition.
- What is a classical center?
- Will content e-commerce trigger a new e-commerce revolution?
- PLC ladder diagram and circuit diagram
- Maotai is a joint brand of coffee and ice cream. How does traditional wine transform to open the market for young people?
- Hometown customs essay 600 words Spring Festival sixth grade
- What are China's top ten famous teas and what are their characteristics?