New neural network architecture - EfficientDet

Hello, Habr! I present to you the analysis of the article "EfficientDet: Scalable and Efficient Object Detection" by Mingxing Tan, Ruoming Pang, Quoc V. Le.

In recent years, tremendous progress has been made towards more accurate object detection, while modern object detectors are also becoming more expensive. The cost of computational processes and costly computing hinder their deployment in many real-world applications, such as robotics and autopilot cars, where model size and delay are severely limited. Given these limited real-world resources, model efficiency is becoming increasingly important for object detection.
There have been many previous works aimed at developing more efficient detector architectures. Often such works tend to be more efficient, sacrificing accuracy. A natural question arises: is it possible to build a scalable detection architecture with higher accuracy and greater efficiency with a wide range of resource limitations? The creators of EfficientDet believe that they have found the answer to this question.

EfficientDet: Scalable and Efficient Object Detection


image

The table above shows that EfficientDet
achieves much higher accuracy with fewer calculations
than other detectors.

What is the EfficientDet architecture?


image
The overall EfficientDet architecture is largely consistent with the paradigm of one-stage (one-stage) detectors. Based on EfficientNet, pre-trained on ImageNet, a layer with a weighted bi-directional
feature pyramid (BiFPN) is attached to it , followed by a class and block network for generating object class predictions and a bounding box, respectively.

A bit about BiFPN:


The idea of ​​creating a bidirectional feature pyramid came about after studying network performance and efficiency to improve scaling: FPN, PANet, and NAS-FPN. PANet achieves better accuracy than FPN and NAS-FPN, but at the cost of more parameters and calculations. To improve the efficiency of the model, several optimizations for cross-scale connections were proposed:
  • -, , . : , , . PANet, 2 ();
  • -, , , , 2 ();
  • -, PANet, , ( ) , .

image

Feature Network Design - (a) FPN introduces a downward path to fuse multiscale features from level 3 to 7 (P3 - P7);
(b) PANet adds an additional bottom-up path on top of the FPN;
© NAS-FPN uses neural
search architecture to search for an irregular network topology of objects;
(d) add expensive connections from all input features to output features;
(e) simplifies the panel by removing some nodes;
(f) our BiFPN with better compromises of accuracy and efficiency.

image
The table shows how, starting with RetinaNet (ResNet50 + FPN), the encoder was replaced with EfficientNet-B3, and then the base FPN was replaced with BiFPN, the accuracy grew with each change.

image
EfficientDet also uses a tricky function instead of SoftMax, which is based on the fast merge normalization method, which provides the same accuracy as a Softmax-based merge, but runs 1.26-1.31 times faster on GPUs.

image
The performance in image classification has also been improved by jointly increasing all network sizes, depth and input resolution.
The graph shows a comparison of different scaling methods. All methods improve accuracy, but a comprehensive scaling method provides better compromise accuracy and efficiency.

image
In the figure you can see a comparison of the model size and output
delay : the delay is measured with the size of batch 1 on the same machine,
equipped with a Titan V GPU and Xeon processor. AN stands for AmoebaNet + NAS-FPN, pre-trained with Augmentation.

Conclusion:


As a result of a systematic study of various options for designing a network architecture for effective object detection, a weighted bidirectional functional network and a custom composite scaling method are proposed to increase accuracy and efficiency. Based on these optimizations, a new family of detectors called EfficientDet has been developed that consistently achieve greater accuracy and efficiency than the prior art, with a wide range of resource limits. In particular, our EfficientDet-D7 achieves state-of-the-art accuracy with fewer parameters and FLOPS than the best of existing detectors. EfficientDet is also 3.2 times faster on the GPU and 8.1 times faster on the CPU.

Source: Mingxing Tan Ruoming Pang Quoc V. Le
Google Research, Brain Team "EfficientDet: Scalable and Efficient Object Detection"
arxiv.org/abs/1911.09070

All Articles