
There are a number of cases in the industry that require
scene number recognition. Often the required condition for the recognition algorithm is a low value of the second kind of error, namely, cases when an invalid number is recognized. An example of such tasks is:
- Number recognition on discount, bank cards, Figure 1.
- Car number recognition, Figure 2.

1 β
2 β ,
, , :
(scene number recognition) : 0.03.
false positive (FP) β , . , "177", "777", .
, CRNN (Convolutional Reccurent Neural Network)[1].
github.
Python3, PyTorch.
PSPNet[2]. , github PSPNet Pytorch.
CRNN,
medium [3], [4].
CRNN 3.

3 β CRNN
. , : CNN [5], LSTM [6].
:
- CNN. . , , , , . , . , , 4;
- LSTM. LSTM (time step). LSTM . LSTM many to many, . , Bidirectional LSTM, ;
- . . β ;
- . n Yn: kn = max(Yn). , , . , , : Β«3200-544Β». "-" , . , Β«00Β» Β«44Β», .

4 β
: h, w β ; n β .
, , 5.

5β β β
, : .
.
CRNN , 6.

β 6 β . : , , . CRNN 1, CRNN 2 β
, , . - .
., "5" , . , , . , :
: s β , v β , x β .
. , :
: f β , x β , y β .
10 pf = 0.9.
:
pf =
: pf β , β i- , β j- .
10 , pf = 0.1, pf = 0.9 .
, ps = 0.97, : pk = 0.97*0.97 = 0.94.
: .
, , . , S = (280, 64), S2 = (320, 64).
, . S = (280, 64), 1.

1 β .
: BS β ; AS β ; k, s, p β , , , : max_pooling
. , . PSPNet.
400 , β 100 , , , 5-10 % , , 5.

2 β . inter_bad β , inter_good β ; good_1, good_2 β , ; amount_cards β , percent_good_1, percent_good_2 β , ; percent_good β ; percent_bad β
, , 1, 0.8816, 0.1184. , - .
, 0.0177, 0.863813, 0.0954 0.0230. , .
, ,
:
, CRNN scene text recognition, .
CRNN, , .
In addition to this approach, I tried to cut off false predictions with a probability less than a certain threshold, however, in this case, the accuracy of the prediction fell to 0.3, which was unacceptable.
List of sources
- Original CRNN article;
- Pyramid scene parsing network
- Build a Handwritten Text Recognition System using TensorFlow;
- An Intuitive Explanation of Connectionist Temporal Classification;
- Convolutional neural network in python
- LSTM - Networks for Long-Term Short-Term Memory