Localization of a QR code is an important task, undeservedly deprived of attention

We are sure that today there is not a single Habr reader who would not be familiar with QR codes. These two-dimensional barcodes are everywhere. It is logical that in the world there are many tools that allow to add QR codes to your project with some degree of efficiency. The whole point is that this mentioned efficiency directly depends on the quality of the tool that is used to recognize QR codes. And here comes the classic plug: you can solve the problem (very) well and (very) expensive, or you can for free and somehow. Is it possible to modify the free so that it nevertheless solves the problem well? If interested, look under the cat.

Recognizing a QR code in a photograph is a well-posed task of machine vision. Firstly, in the task, an object is investigated, which was originally specially designed for “convenient” recognition. Secondly, the task itself is divided into several independent understandable subtasks: localization of the QR code, orientation of the QR code and directly decoding the QR code. It turned out that the public domain has long had good libraries that can solve the last two problems: orientation and decoding of a QR code. One problem: for high-quality decoding, such libraries expect a good binary image directly of the barcode to be input. Conversely, little attention is paid to the task of barcode localization in the image.

In our experience, the more accurately you localize a recognition object, the easier it is to choose the right pre-processing tools and, in fact, recognize it. Therefore, if you want to improve the quality of recognition of QR codes in your project, then start with the modernization of methods for localizing QR codes. Indeed, even if you later need to binarize an image, it is much more efficient (both from a computational and qualitative point of view) to binarize a region with a barcode than the entire original image.

In this article, we will tell you how to easily improve the quality of localization of QR codes using classical image processing methods, as well as provide numerical characteristics of the effectiveness of the proposed algorithm.

We will talk about the original way of localizing QR codes on images, using the modified Viola and Jones method as its basis.

Information note on the topic of the article


In this section, we describe the key features of the QR code that are used to construct the localization method, as well as a brief description of the original version of the Viola and Jones method.

QR code


The QR code (short for Quick Response Code) is a two-dimensional barcode that was developed in Japan in the mid-90s for the automotive industry. Due to the ability to quickly read and larger capacity compared to linear barcodes, the QR-code system has become popular worldwide in various areas of life.

Unlike standard linear barcodes, which are usually scanned by hardware, a QR code is often scanned by a camera. The structure of the QR code is fully described in ISO / IEC 18004 (The ISO / IEC standard 18004). To build a robust recognition algorithm for such images, the QR code has some reference points that form a function pattern: three squares in the corners of the barcode image (called finder patterns) and smaller synchronizing squares throughout the barcode image (called alignment patterns) . Such points allow you to normalize the image size and its orientation.


Fig. QR code structure



Although visually all QR codes are similar to each other, different copies of QR codes, depending on the amount of encoded data, can have a different layout of internal elements. In addition, the so-called designer QR codes are very popular, in which instead of the part of additional information that guarantees high-quality recognition of the barcode, third-party graphic elements are used (logos, emblems, inscriptions, etc.). All these features of QR codes must be taken into account when constructing methods for localization and recognition of QR codes.



Fig. Different valid QR code options



Viola and Jones Method


Only the lazy on Habré did not write about the method of Viola and Jones yet. Even we in our block did this several times (for example, here , here or here ). And still, we consider it necessary to very briefly, literally in two paragraphs, tell what it is.

The Viola and Jones object detection method was developed to search for faces in an image in real time. This method reduces the detection problem to the binary classification problem at each image point, i.e., for each rectangular image region taken with all kinds of shifts and scales, the hypothesis of the presence of the desired object in the region is checked using a pre-trained classifier.

As a feature space, the Viola and Jones method uses Haar rectangular features, the value of which is calculated as the difference between the sums of the brightness of the pixels of the image areas inside adjacent rectangles. To effectively calculate the value of Haar features, an integrated image is used, which is also known in the literature under the term summed-area table. A binary “weak” classifier h ( x ): Χ → {-1, + 1}, usually presented as a recognition tree with one branch:



where θ and p- the threshold value of the attribute and the parity of the classifier, respectively. Next, using the AdaBoost machine learning method, a “strong” classifier is constructed as a linear superposition of the above “weak” classifiers. The high speed of the Viola and Jones method is ensured through the use of a cascade of “strong” classifiers, which allows localizing “empty” (object-free) image regions in a small number of calculations.

QR code detection algorithm


When constructing a method for localizing a QR code, we relied on the following features of the task. First, the developed method must have high performance for use in recognition systems operating in real time. Secondly, the method must be resistant to permissible distortion of the barcode in the image. Thirdly, the method should take into account all the existing variability of QR codes.

As mentioned above, we have chosen the method of Viola and Jones as the fundamental method. This method has proven itself in various tasks of searching for rigid objects, while the method provides the required performance. But in the original version, the Viola and Jones method cannot be used for the following reasons:

  • in the classical method of Viola and Jones, a family of Haar attributes is used that “emphasize” the textural features of the object, and in our case, although the QR code consists of black and white barcels, their distribution is very different from the barcode to the barcode;
  • the classical Viola and Jones method is designed for the same type of detection of objects in a given orientation, which is also not observed in our task.

So that the Viola and Jones method can be applied to solve the problem, we use the original family of boundary features and a high-level classifier in the form of a decision tree. The first modification will allow focusing on the boundary features of the studied object, and not on the texture. The second modification will allow you to build a single classifier that can detect variable objects. Next, we will tell you a little more about each modification.

Gradient signs of Haar.


To build an effective detector of QR codes, we used a special family of gradient features [1]. These signs are Haar rectangular signs, calculated on top of the map of directed boundaries, which can significantly improve their generalizing power.

The map of directional boundaries is an image of the gradient modulus, which additionally takes into account the preferred direction of the gradient at the point ( x , y ), defined as the discretization of the angle of the border into horizontal, vertical, + 45 ° and –45 ° directions. To build a QR-code detector, we used two types of directional boundary map: a straight boundary map and a diagonal face map.

Let the original image f ( x ,y ). Then you can calculate the approximate value of the derivative along the horizontal and vertical directions using the Sobel operator:



In addition, using g x and g y , you can calculate the direction of the gradient at each point of the image: The



map of straight borders contains mainly horizontal and vertical borders and is calculated as follows:



Map diagonal borders contains mainly borders along the diagonals and is calculated as follows:



On top of the constructed map of directed borders (diagonal or straight), rectangular Haar signs are calculated. Unlike the classical Haar features, such boundary features generalize well objects containing a huge number of boundaries.



Fig. Illustration of a map of directional borders: (a) the original image of a QR code, (b) a map of straight borders, (c) an image of a rotated QR code (d) a map of the diagonal borders of a rotated QR code



The decisive tree of strong classifiers


The tree of strong classifiers [2] is a kind of binary decision tree: the tree node is a strong classifier, on the right edge of which there are subwindows presumably containing the object, and on the left - those that were not recognized as an object, respectively. The final answer is given only in leaves. The classical cascade classifier described in the original work of Viola and Jones is, in fact, a tree classifier containing only one “positive” output (leaf) and many “negative” outputs.

In [2] it is shown that any path from the root to the lowest node of the tree classifier can be represented as a cascade in which individual strong classifiers enter with an inverted answer. Thanks to this, it is possible to construct a learning algorithm for the tree classifier, which uses the training procedure of the classical cascade classifier to teach individual paths.

The tree classifier allows you to train classifiers that are more effective in terms of completeness for variable objects as compared to classical cascading classifiers.

Experimental results


As part of an experiment to evaluate the effectiveness of the barcode localization method proposed in this article, a set of barcode images consisting of 264 images was prepared. The physical size of the images was about 1 MPix. Each image contains only one QR code in an arbitrary orientation, the barcode area was at least 10% of the total image area. The figure below shows examples of images from the assembled set.



Fig. Examples of images from the assembled barcode image set



The prepared set of images was divided into a training set and a test set. The size of the training sample was 88 images, the size of the test sample was 176 images.

The training set was used both for preparing positive examples and for preparing negative examples. Since the initial number of positive examples was small, we used data augmentation technology [3]. In particular, we applied rotation around the center of the barcode in increments of 15⁰. After augmentation, the number of positive examples was 2088 examples.

Using the same positive and negative examples, we trained three QR code detectors: a classic cascading classifier with standard Haar features, a classic cascading classifier with boundary features, and a tree classifier with boundary features. The first cascade classifier consisted of 12 levels and contained a total of 58 attributes. The second cascade classifier consisted of 8 levels and contained a total of 39 attributes. The trained tree classifier consisted of 39 vertices, contained a total of 110 characters, and the maximum path from the top to the leaf was 9. Below is a diagram of the trained tree classifier.



Fig. Scheme of trained tree classifier



To assess the quality of the constructed QR-code detectors, we used the barcode decoding module from the OpenCV open-source computer vision library. On the prepared test set of images (which, as mentioned above, consisted of 176 images), we launched the decoding module without any special preprocessing, as well as after a preliminary search for QR codes using trained detectors. Below are the barcode decoding results:
No.Experimental titleDecoded Image CountDecoding quality
1Only OpenCV10459.09%
2VJ (Grayscale Features, Cascade Classifier) ​​+ OpenCV10559.66%
3VJ (Edge Features, Cascade Classifier) ​​+ OpenCV12369.89%
4VJ (Edge Features, Tree Classifier) ​​+ OpenCV13677.27%

The table shows that the preliminary localization of the QR code using the described method can significantly improve the quality of barcode decoding (the number of decoding errors decreased by 44%). In addition, the results also demonstrate that the application of the original Viola and Jones method (with classical Haar features and a cascading classifier) ​​is not effective in the task of localizing QR codes.

Now let's see how accurately each classifier localizes the barcode. The figure from left to right shows the results of detecting the same barcode with a classic cascading classifier with standard Haar features, a classic cascading classifier with boundary features, and a tree classifier with boundary features. It can be seen that the tree-classifier provides the best barcode localization accuracy by taking into account the variability of QR codes.



Fig. Illustration of the work of trained detectors on the same image



Conclusion


Today, QR codes are used in various areas of life: in the advertising industry for encoding URLs, in the state segment as part of electronic services, etc. Despite the extremely high distribution of such barcodes, the existing open source libraries focus on the decoding process, not the localization problem. But to be honest, the true purpose of this article was not so much in describing an effective method of localizing QR codes, but rather in an attempt to tell you, dear reader, how, using scientific thinking and system analysis, understanding how to use classical digital image processing tools, you can free libraries bring to the actual industrial level. Thank you for the attention.

List of sources used
[1] A.A. Kotov, S.A. Usilin, S.A. Gladilin, and D.P. Nikolaev, “Construction of robust features for detection and classification of objects without characteristic brightness contrasts,” Journal of information technologies and computing systems, 1, 53-60, (2014).
[2] A. Minkina, D. Nikolaev, S. Usilin, and V. Kozyrev, “Generalization of the Viola-Jones method as a decision tree of strong classifiers for real-time object recognition in video stream,” in Seventh International Conference on Machine Vision (ICMV 2014), 9445, International Society for Optics and Photonics, (2015), doi:10.1117/12.2180941.
[3] D. P. Matalov, S. A. Usilin, and V. V. Arlazarov, “Modification of the viola-jones approach for the detection of the government seal stamp of the russian federation,” in Eleventh International Conference on Machine Vision (ICMV 2018), 11041, International Society for Optics and Photonics, (2019), doi:10.1117/12.2522793.

All Articles