Number recognition algorithm on the image with a low probability of the second kind of error

image

There are a number of cases in the industry that require
scene number recognition. Often the required condition for the recognition algorithm is a low value of the second kind of error, namely, cases when an invalid number is recognized. An example of such tasks is:


  1. Number recognition on discount, bank cards, Figure 1.
  2. Car number recognition, Figure 2.

image

1 –
2 – ,


, , :


  • ;
  • ( );
  • , , ..


(scene number recognition) : 0.03.


false positive (FP) β€” , . , "177", "777", .



, CRNN (Convolutional Reccurent Neural Network)[1].


github.


Python3, PyTorch.


PSPNet[2]. , github PSPNet Pytorch.



CRNN,
medium [3], [4].


CRNN 3.


image

3 – CRNN


. , : CNN [5], LSTM [6].


:


  1. CNN. . , , , , . , . , , 4;
  2. LSTM. LSTM (time step). LSTM . LSTM many to many, . , Bidirectional LSTM, ;
  3. . . β€” ;
  4. . n Yn: kn = max(Yn). , , . , , : Β«3200-544Β». "-" , . , Β«00Β» Β«44Β», .

image

4 –
: h, w β€” ; n β€” .



, , 5.


image

5β€Š – β€Š

, : .


.


CRNN , 6.


image

β€Š 6 – . : , , . CRNN 1, CRNN 2 β€”


, , . - .


.

, "5" , . , , . , :


x=s+v,v>x
: s β€” , v β€” , x β€” .


. , :


y=f(x),y∼U
: f β€” , x β€” , y β€” .


10 pf = 0.9.


:


pf = βˆ‘i=1,j=110P(y=yj|yi=yj)
: pf β€” , yiβ€” i- , yjβ€” j- .


10 , pf = 0.1, pf = 0.9 .
, ps = 0.97, : pk = 0.97*0.97 = 0.94.


: .
, , . , S = (280, 64), S2 = (320, 64).


, . S = (280, 64), 1.


image

1 – .
: BS β€” ; AS β€” ; k, s, p β€” , , , : max_pooling



. , . PSPNet.


400 , β€” 100 , , , 5-10 % , , 5.



2 – . inter_bad β€” , inter_good β€” ; good_1, good_2 β€” , ; amount_cards β€” , percent_good_1, percent_good_2 β€” , ; percent_good β€” ; percent_bad β€”

, , 1, 0.8816, 0.1184. , - .


, 0.0177, 0.863813, 0.0954 0.0230. , .






, β€”

, ,




:


  • . , . , , ;
  • . , ;
  • . .


, CRNN scene text recognition, .
CRNN, , .


In addition to this approach, I tried to cut off false predictions with a probability less than a certain threshold, however, in this case, the accuracy of the prediction fell to 0.3, which was unacceptable.


List of sources


  1. Original CRNN article;
  2. Pyramid scene parsing network
  3. Build a Handwritten Text Recognition System using TensorFlow;
  4. An Intuitive Explanation of Connectionist Temporal Classification;
  5. Convolutional neural network in python
  6. LSTM - Networks for Long-Term Short-Term Memory

All Articles