Traditional Culture Encyclopedia - Traditional stories - Target detection: introduction of YOLO and SSD
Target detection: introduction of YOLO and SSD
Commonly used target detection frameworks can be divided into two categories. One is the two-stage/two-trigger method, which is characterized by separating the detection and classification of the region of interest. Representative ones are R-CNN, FAST R-CNN and Fast R-CNN. The other is one-stage/one-time method, which uses a network to detect and classify regions of interest at the same time, represented by YOLO(v 1, v2, v3) and SSD.
The two stages appear earlier because it needs to separate the detection and classification of the region of interest. Although the accuracy is relatively high, the real-time performance is relatively poor, which is not suitable for application scenarios such as autonomous driving and unmanned vehicle perception. So this time we mainly introduce SSD and YOLO series framework.
SSD and 20 16 were put forward by W. Liu et al. in the article SSD: Single Multi-box Detector. Although it was put forward a little later than YOLO(v 1) in the same year, it is faster and more accurate.
The framework of SSD adds some additional structures to a basic CNN network (the author uses VGG- 16, but other networks can be used instead), which makes the network have the following characteristics:
Multi-scale feature map detection
The author added some characteristic layers after VGG- 16, and the size of these layers gradually decreased, which allowed us to make predictions at different scales. The deeper and smaller the feature map, the larger the predictable object.
Convolutional network prediction
Different from YOLO's fully connected layer, SSD classifier uses convolution to predict each channel feature map used for prediction, in which the number of prior frames placed in each cell is the number of prediction categories.
Set transcendence box
For each cell on the feature graph, we place a series of previous boxes. Then, for each previous frame corresponding to each cell on the feature map, we predict the dimension offset of the previous frame and the confidence of each category. For example, for a new feature map, if each feature map corresponds to a previous box and the category to be predicted is classified, the output size is. (reflected in the training process)
Among them, if the center position and width and height of the previous frame are used to represent the center position and width and height of the predicted frame, the dimension offsets of the actual prediction are respectively:
The following figure is a frame of SSD. First, the first five layers are convolved with a VGG- 16, and then a series of convolution layers are cascaded, in which six layers are convolved separately (or the average pool of the last layer) to predict, and the output of one is obtained, and then the final result is obtained through maximum suppression (NMS).
There are four characteristic graphs used for network detection, the sizes of which are,,, and; Each unit of these characteristic graphs corresponds to a preset prior box of,,, and, so the network * * * predicts a boundary box, and the output dimension (before maximum suppression) is.
to be continued
Reference:
CSDN blog of chenxp23 1 1: Paper reading: SSD: single multi-box detector.
Xiaojiang Tiger Column: Target Detection |SSD Principle and Implementation
LittleYii's CSDN blog: target detection paper reading: YOLOv 1-YOLOv3 (1)
Other related articles of the author:
Image Segmentation: Full Convolution Neural Network (FCN) Detailed Explanation
PointNet: Detailed explanation of 3D point cloud classification and segmentation model based on deep learning
Vision-based robot indoor positioning
- Previous article:How to name a girl (traditional naming) unique and novel girl's name
- Next article:Character Honors of Zhao Tarim
- Related articles
- China Traditional Culture's Criticism of Classical Chinese
- Introduction of Jimei Xue Cun Scenic Spot in Xiamen
- Is the patent of traditional furniture true, Zhihu?
- Hukou Falls Best Snacks
- How many years is the depreciation period of office desks and chairs? How to calculate?
- Compared with the strategic-oriented KPI system, the purpose of general performance evaluation system is around ().
- Are there any delicious specialties on the rooftop?
- The process of the opening ceremony
- What are the interesting tourist attractions in Yibin?
- How about Jiangsu Shengping Supply Chain Management Co., Ltd.