How I learned not to worry and loved machine vision

Hello, Habr! My name is Artyom Nagumanov, I have over 15 years of experience in software development, project management, teams, IT departments. I have always been interested in the topic of artificial intelligence and machine vision. While developing software, I was always visited by the thought of why not add at least some piece of intelligence to the enterprise application in order to partially or completely refuse the user’s participation in any process that at first glance seems completely unformalized.

image

To do this, each time I had to go all the way anew: study / recall the appropriate libraries, install on a Linux virtual machine to test free tools for creating and training neural networks, find network architectures that are currently the crown of human creation in the world artificial intelligence. This process is quite laborious and not very fun. Once, understanding the architecture of another neural network, I realized that it was time to put an end to this and create a universal tool that would take on the whole routine. And I just need to press a few large glass buttons and get the result. In the picture to attract attention (in the first paragraph of the article), an example of recognition of water jets from an irrigation machine is shown, but first things first.

We make recognition service


I had the following idea - to create a website where the user can upload any images, click the “Recognize” button and get the result in json format. But the fact is that there is no universal algorithm or neural network that can find any objects in the world. As a result, I decided to make it possible to train my own models of neural networks to recognize the objects we need, using only the site and sample images that need to be recognized. Before starting work, I analyzed what developments in the world exist on this topic. As it turned out, many giants of the IT industry are working in this direction: Yandex, Mail.ru, Amazon. The main disadvantage was that all these giants want money for their services. This was enough for me to start my own development.I already had background, and I knew how to find and classify objects on images well, all that was left was to put everything together and make a convenient interface.

In the vast majority of cases, I use Microsoft technologies in my projects, which significantly affected the technologies used in this project:

  • ASP.NET (C # language)
  • Webpi
  • Javascript jquery
  • MSSQL Server
  • Python
  • Tensorflow

First of all, I decided to create a REST service, which will be the brain of the entire system, which will allow us to accept commands through the API and do all the rough work. The architecture of the neural network, I chose Mask R-CNN, this network can find objects in the image and classify them.

Everything went more or less regularly, until I reached the realization of training my own models. The learning process was very costly and demanding on the hardware. Of course, I knew this before, but before that, I could afford to wait for the training of the model for a day or two, using the CPU for calculations. But since the idea was to create a convenient and quick tool, I could not put up with the long training of the model. So the question has ripened with the installation of a special server with a suitable video card that will support CUDA technology ( https://ru.wikipedia.org/wiki/CUDA) and will significantly reduce the time it takes to train the model. In order not to inflate the project budget, I chose a non-expensive NVIDIA GeForce GTX 1050 Ti graphics card. In parallel with the service, I made a client on Windows Forms that would allow testing.

In the second stage, I made a website, which is essentially a client to the service. I didn’t really bother with the design and didn’t even take the finished template, I wrote everything myself, using only what Visual Studio gives out of the box. It took another three months to create and test the site.

I was incredibly happy when I uploaded the image, selected a ready-made model for recognition, clicked the “Recognize” button and got the first result.

image

The real task of machine vision


While I was working on the recognition system, a real task flew in to me - to determine the position of the front shovel of a special machine that cleans snow on the road. To solve this problem, I was not limited in choosing an approach to the solution and from the beginning I wanted to use the installation of special sensors that determine the angle of the shovel, but quickly abandoned this idea. As it turned out, these sensors need to be constantly calibrated due to the fact that the shovel is periodically removed and another is put in, besides talking with colleagues installing such sensors, I found out that these sensors are on the street and are available for sabotage. The idea came to mind to analyze the working controls of the shovel, but as it turned out, in real life they go so far as to remove the shovel, and on the working bodies they signal that the shovel is lowered,in a word, again not a reliable option. Well, they left me no choice but to use the full power of the recognition service. I put the camera in the passenger compartment of the car, which is looking at the front spade, collected several hundred samples of the spade, marked them with the VGG tool, which I built into the site and started training. The process took about two hours and on the 30th era I got a quite acceptable recognition result, the recognition accuracy was around 95%.The process took about two hours and on the 30th era I got a quite acceptable recognition result, the recognition accuracy was around 95%.The process took about two hours and on the 30th era I got a quite acceptable recognition result, the recognition accuracy was around 95%.

image

So that the system could fully work on the car, I had to modify it: add the ability to upload photos via ftp, and not just through the API because the camcorder allowed you to take a photo and send it to the specified ftp. Added the ability to save the result in the database, and then through the API for the specified period of time to get the results, analyze them and write to the accounting system written in 1C, which by the way is a huge control system that analyzes thousands of cars at the same time and controls many parameters, such as mileage , machine hours, fuel in the tank, rides, idling, etc., but this is a completely different story. The first version of the shovel control is ready, but in battle it will be tested only when snow falls.

How easy is it?


The main idea of ​​the recognition service is the simplicity of creating and training your own models. To understand whether anyone can really create and train their own model, I asked my assistant to create a model for recognizing water jets from an irrigation machine. The only input that I gave her was a video of the operation of the irrigation machine, taken from the cab, and the address of the recognition service, in the evening I set this task for her, and in the morning, going into the system, I saw a ready-made model that really worked. Below is a screenshot of the window in which the model is trained.

image

As you learn, the era of the model is preserved, and we can, without interrupting the learning process, test any era.

Development and use


An example of how everything works live can be seen in the video clip .

I thought for a long time that if anyone can create their own model, then why not make it possible to share this model with other users of the system. As a result, I added a section where you can place a successful model with one click of the mouse and any participant can use it. The service is completely free.

Now I myself use the service to solve my tasks of automating processes in enterprises in which machine vision can be used. In the future I plan to add other recognition approaches, not just neural networks. I will be very happy if someone else using this service can solve their problems.

All Articles