How many programmers and words do you need to recognize a handwritten passport?

Do you think handwritten passports are often found in our country? When we began to design a passport recognition system at Smart Engines , it seemed that it was enough to teach the system to recognize typewritten documents in a quality manner. At that time, the presence of handwritten passports that could not be automatically recognized did not seem to be an important problem: there were enough unsolved problems without it. A year ago, analyzing the quality of work of Smart IDReader, we realized that we got to the point where handwritten passports constitute a significant class of errors. In accordance with the scientific approach, they studied the problem and took up the decision. Today there will be a story about how we made the recognition of a handwritten passport of the Russian Federation, successfully solving the last problem on the way to fully automate the input of passport data.


The task of recognizing handwritten text in a general way sounds fundamental, large-scale, and unsolvable. Therefore, at first it is important to correctly limit the formalization of the task. So, we will recognize the handwritten text of the main U-turn of the Russian civil passport. Such passports are filled in neat calligraphic (at least according to the passportist) handwriting. On the one hand, this makes the task easier: we don’t have to recognize “medical scribbles” and other poorly read texts. But, on the other hand, we have to face all the variability of calligraphic styles of Cyrillic letters. Well, well, this is more of a serious challenge, not an insurmountable problem.

The task of handwriting recognition in the passport we divided into three subtasks:

  1. Detection of the presence of the manuscript in the passport.
  2. Segmentation of a handwritten line into characters.
  3. Character recognition and post-processing.

Further in the article we will tell you more about the solution to each subtask. But first, we will discuss one very important problem that always arises first in recognition - datasets. Without datasets, normal recognition cannot be done: even if you can train neural networks on synthesized data, you still need data that will measure the accuracy of the trained system. As it turned out, there aren’t any suitable manuscript datasets on the network. Therefore, our list of subtasks was supplemented with a zero point - "Preparing the dataset." We approached this process creatively: we distributed notebooks “in a ruler” and asked all our programmers to “get used to the role” of calligraphy masters - to rewrite some pre-prepared texts with beautiful handwriting. These were verses by A.S. Pushkin.

Here we were waiting for the first disappointment. No matter how tough it may sound, it turned out that our programmers have completely forgotten how to write. And you can’t say that they did not try. No, they just forgot how to spell letters by hand. Here is an example of what happened:



Agree, it’s not at all what is needed. The letters are dancing, the sizes are not respected ... I had to look for copybooks on the Internet, and, as in the first class, put everyone in writing in the literal sense of the word! We still remember this time with a smile: the entire team (without exception, from third-year students to honored doctors of science) sits at a desk and carefully displays letters.



In two days, having filled our hand, we were ready to repeat an attempt to collect “raw” data for the handwritten dataset. The letters became smoother, the words more readable. And some even managed to introduce some elements of calligraphy. Now, look at the new samples yourself:









As a result of such blanks with different texts and handwriting, we collected about 1000 pieces, carefully digitized them and put them into lines and symbols. All, congratulations, the manuscript dataset is ready. Back to the algorithms.

Passport manuscript detection


Manuscript detection is an important element of an industrial document recognition system. This functionality belongs to the category of "understanding of the document" and is in great demand on the side of the business customer. We trained a binary convolutional neural network, which analyzed the input images of individual text lines of the passport. Each of the fields of the passport is analyzed for manuscript using this network and then, by weighing the received estimates, a general decision is made on the “manuscript” of the passport as a whole.

Handwritten line segmentation into characters


Handwriting segmentation is fundamentally different from typescript segmentation. To understand the first level of the problem, just try handwriting the word “chinchilla” and look at these “slender rows of hooks”. For the segmentation of handwritten tests, we again used neural networks. We trained a special neural network, which as an answer returns an estimate of the presence of a “cut” between letters at each point of the input image of the text. Further, applying the principles of dynamic programming, sections of letters are constructed.




The problem of putting gaps between letters is far from all the pain of segmentation. You must correctly limit each character vertically. And here the search for “baseline” lines, which is often used when recognizing printed text, is not at all applicable - the height of handwritten letters jumps within unlimited limits.

Character Recognition and Post Processing


The problem of recognizing handwritten characters is mainly the same style of different characters. Look at the example above: which surname is written - “Petrov” or “Netrov”? When a person reads a handwritten text, he never reads it character by character, but always within the given context. The recognition system in this case should behave the same way. So, a neural network that recognizes handwritten characters should be “tolerant” to different letters that have the same type (from a mathematical point of view, it should return the same confidence values ​​for such characters), and subsequent algorithms for analyzing and processing recognition results (so called "postprocessors") should take into account the features of the recognized field.

What did we achieve in the end?


So, in a year, we learned to recognize handwritten passports, leaving behind this fundamental problem that seemed unsolvable several years ago! What's next? Then, as usual, work on quality and new frontiers.

PS I almost forgot about the answer to the question stated in the title. So, we have 62 programmers in the company. We bought 150 notebooks and printed 2,000 sheets of prescriptions.


All Articles