☸️ 🥗 👩🏾‍🏭 How to explain your point of view to the robot 😸 📇 🍣

Ever wonder why robots are needed today? From childhood, it seemed to me that robots are somewhere in modern factories, that it is somewhere far away from us. Or in science fiction.
But not anymore. Today's robots are automation of any routine process. They can be put both on farms, and in auto repair shops.

If before the price of such automation was huge, now it is falling. More complex technological manipulations become available. Roboruki is essentially such a universal manipulator that does not need to be designed for each task, => lowering the cost of implementation, accelerating implementation (although a roboruk can be more expensive than a piece of conveyor that performs a similar operation).

But roboruk is only half the process. The second half is to teach Roboruk to think. And until recently, the situation was terrible. There are no universal approaches that any engineer can configure. We need to hire programmers / developers / mathematicians to formulate the problem, try to make a solution. Of course, such a situation could not exist for long. Yes, and Computer Vision with deep training drove up. So now, some kind of primary automation is beginning to appear, not only of strictly repeating processes. Today we’ll talk about her.

Pick-it

The company offers a solution that allows you to capture a variety of objects using various roboruk. As part of their solution - a 3D camera and special software for training in the capture of objects and subsequent capture.

(search for cylindrical objects)

There are pre-trained forms that are often found in industry: parallelepipeds, cylinders.
The usage order is approximately the following:

the client shows the 3D camera objects for capture from several sides (or uploads a CAD file of the part)
indicates directions from which to capture (not necessarily the only one)
configures the integration of the robot with Pick-it software to perform the capture task and configures the necessary actions.

Of course, it does not sound too complicated, but it will require some qualifications on the client side.
The main minus is that as soon as the external parameters (styling / lighting / form) change, the system may stop working, and it is far from always obvious what went wrong to retrain it. There is no stable process.

Computer Vision Technology:

It is impossible to say exactly which technology stack is used in the company. But, judging by the time the company was founded, information about technology on the Internet and other indirect signs, the “up to deep-learning” stack of technologies for working with 3D scenes is used. For example, searching for 3D transformations for better alignment of point clouds ( ICP methods and RANSAC method). Sometimes special points are used, sometimes tricky ways to combine point clouds or a combination of methods with some heuristics.

(Robust 3D point cloud registration based on bidirectional Maximum Correntropy Criterion, Xuetao Zhang, Libo Jian, Meifeng Xu)

The key to success in this case is your own good 3D scanner, the quality of which determines the reliability of all these methods. It is also important that the deviation of the shape of sample objects and those objects that need to be captured is not too large.
Major robot manufacturers also have similar systems:
ABB | Kuka | Fanuc , as well as ( Cognex ).
But Pick-it covers more variability in the breadth of applications.

Standard approach now for variable objects

Thanks to the advent of deep-learning in computer vision, it has become easier for some types of objects to train a convolutional network, which, in addition to detection, also evaluates the necessary parameters.

The greatest scope for such methods is agriculture. From plant inspection to fruit picking. In a way, a classic example is picking cherry tomatoes. Here are a few examples of companies that collect crops:

Collecting tomatoes. Estimated size / distance / color

If you look closely, it doesn’t collect very well.

Often, proper cultivation is already 95% of the robot.

About this horror with an accuracy of 89%, even the article on Habré was.
Most of these startups use a detector like SSD or YOLO with subsequent (or simultaneous) assessment of ripeness parameters. The position of the same fruit space for capture is estimated by 3D or stereo cameras.

Accordingly, the manufacturer (and partly the solution integrator) faces the following tasks: recognition recognition quality, replenishment of the training sample in real conditions, periodic training, writing an algorithm that ties in the CV part, part with 3D evaluation and part with capture.
In our experience, solving such a problem each time takes a couple of months.

Another approach

And if you want to work with the learning system on Deep learning, but not stop at one application? And to train even without complex configuration software for each task on the client side.
It would be great to show the robot what to do, and then he would somehow somehow. But here's how to show the robot?
Google (a link to one of the projects ) and OpenAI ( they didn’t see another project ) are doing projects where the robot is trying to follow human hands and repeat actions. But the accuracy is far from necessary in real applications, and the mathematics of the State-of-art level are difficult to scale.

Is there any other way?
At some point, when we were solving the problem of orienting controllers for VR in 3D space, another puzzle developed for us. After all, virtual reality has long been there. You can show the robot the virtual reality controller how to grab the object. But not in the simulator, like OpenAI, but in reality. Just drafting a manipulator into it, and showing the direction of capture.

It turns out intuitively. After a couple of minutes, a person begins to understand how to grab objects or do some operations with him, controlling the robot in reality.

It is always important to understand whether it is possible to do what the robot wants. And here everything is simple: if a person in VR can show the robot how to solve a problem, then he can be trained to do such things. All that can be shown is within the power of the modern level of Machine Learning, and it is guaranteed to be performed with any existing robot arm. And it eliminates the main disadvantage of modern ML - you do not need huge databases of examples for which to train.

What is the plus of this approach? Well, for example, that you do not need to prescribe low-level logic. Why detect a glass, and then talk on which side how to grab it? Set exact capture locations. You can show:

the glass is on the table normally - grab the wall
the glass is on its side - grab it by the side
the glass stands upside down - grab the bottom

And voila, after an hour we get the result:

Well, or a more difficult task: we want to collect fruits, but we need to keep a branch - this is a difficult programmable logic. But she just learns:

Or a very simple example is to grab and cut a cucumber (of ~~course, only grab was trained~~ ):

Now smart robots are a bit like personal computers in the 80s. There are various hypotheses to which everything will come. The rental price of the robot is equal to the average salary of a worker, which means that robotization of an increasing number of areas of labor is inevitable. Nobody knows how they will manage all this in five years, but judging by how the price of robots is falling and the number of their installations is growing, everything is only gaining ground.

Price:

Volumes:

How to explain your point of view to the robot

Pick-it

Computer Vision Technology:

Standard approach now for variable objects

Another approach

More articles: