🐰 🎅🏿 🤞🏿 How not to join the ranks of aspiring specialists if you are Data Scientist 📽️ 🔟 👨‍👧‍👧

The Habra community conducted another interview in our educational project: live broadcasts with guys from IT who answer your questions in a live communication format.

Our project is an attempt to create a complete set of guides and recommendations for a successful developer’s life: how to build a career, get a dream offer, attract investment in startups, not go out on boring projects, grow up in your business and buy a house by the sea on your way.

At the beginning of the week, Boris Yangel, Yandex ML engineer who participated in the creation of Alice’s brains and now makes unmanned vehicles, answered our questions.

Borya talked about how to become a cool Data-Scientist, how parachuting helps him in his work, why ML conferences are useless, and answered a recent post. angry father about how Alice recommended a video of murder stories to a child.

My name is Boris Yangel, I work at Yandex. I am an ML engineer by profession, and recently I have been managing ML projects. At the moment I am working on the Yandex unmanned vehicle project, developing a part of the ML stack. Previously, he was engaged in Alice, was responsible for the development of the module, which can conditionally be called her “brains” - this module, after speech is recognized, determines what the user wanted to say and decides on the answer. Before that, I worked in the Yandex neural network technology group, and before that, at Microsoft Research, in the Chris Bishop group, on the Infer.net project, this is a library of Bayesian output. Even earlier, I was in Yandex responsible for ranking the search in pictures.

, ? , machine learning ?

The question is a bit strange. I will rephrase: what are the minimum technical skills required (depending on what you want to do), can a person with a liberal education receive them?

If “logging in” is, let’s say, train a neural network not just to distinguish dogs from cats, but to do what you personally need, then there is a way that is accessible to many people. There is a lot of code on the Internet for solving common problems, and now you can easily and quickly take such code, slip your data into it and get the result. This is the simplest thing that can be done, the skills for this require minimal, including programming.

You only need the ability to understand the finished code and edit it. If the code is well structured, it’s easy.

If you “log in” - it means to make a neural network yourself to solve a slightly less trivial task, the task becomes more complicated, more skills are needed.

In order to collect neural networks yourself, you need to at least have a little understanding of mathematics, know the basics of linear algebra, understand what matrices, vectors, tensors are, what you can do with them, what is derivative and gradient descent. I can’t say that only an expert will figure it out, but you need to have knowledge, including what parts the neural networks are made of and how it is customary to dock them to get the result.

Now there are fairly easy-to-use frameworks for connecting elements of neural networks - for example, TensorFlow with the Keras add-in (it is very simple, you need minimal Python knowledge). But Keras may not be enough for non-trivial operations, and then you will have to work with the “naked” TensorFlow - this requires more skills, especially to create your own operations inside TensorFlow. The further you want to go, the more skills you will need. Moreover, the problems begin exactly when something goes wrong: in order to find out why the system does not work the way you need, you need a relatively higher level of skills - you need to be fully aware of what is happening “under the hood. "

What kind of books on data science and machine learning in Python do a beginner need? How to practice this knowledge?

I am not sure I can correctly answer this question. When I was a beginner, there were far fewer good books than now, and finding the right information in a convenient form was more difficult.

Now there are many books on deep learning, for example, the Goodfellow book - there are most of the basics that you need to know on neural networks, and the necessary mathematics. There are books with a practical approach - they not only provide familiarity with mathematical methods, but also immediately tell you how to do something specific in Python.

There is a book by Keras author, Francois Scholl, on deep learning with Keras. There are many books, although I can’t say which ones are better. It seems to me that you can safely take books on famous authors.

If the task is to form the backbone of knowledge, then more fundamental books, for example, Chris Bishop's Pattern Recognition and Machine Learning , will be required — I recommend reading it and doing exercises from it. It is not necessary to read in full, but the main chapters - for example, on the theory of probability - will help to build an understanding of how the whole machine learning forms a single framework.

In addition, it is important to learn how to think in models . We do not just use certain methods to get the result, but we model the data. We need to adopt this way of thinking - for example, Chris Bishop's online book Model-Based Machine Learning will help in this .partially free. Each chapter of this book is an example of a task for which you need to build a model, and in the course of the chapter, you consistently try to do this, gradually complicating the model, until you get the result. This helps to adopt the way of thinking that is needed for data science.

As for the practice - I have already talked about how important it is to know what is happening “under the hood”. To do this, it is best to try to collect something yourself. Write a gradient descent yourself instead of using a ready-made framework, or write a layer and add it to the framework. Try to come up with a relatively non-trivial task with an interesting structure; solve it in the course of determining what knowledge and information you lack. Consistently complicate the solution to improve quality. Let it be such a task, the result of the solution of which will be of interest to you personally.

Now there is a rapid development of TensorFlow JS. I am learning machine learning and want to use this library. What are the prospects for frontend?

TensorFlow JS in the frontend can be used as an entry point to machine learning, although I don’t quite understand why. If this is because you only know JavaScript, this is the wrong motivation; Python is very easy to learn.

TensorFlow JS has its own field of application: this is machine learning, in which inference works directly in the browser, which allows you to create interactive deep learning tools. It allows you to give a person an interactive tool in which you can work with algorithms and models, make visualizations and thereby improve your understanding of the subject. There are probably other promising areas of deep learning where you need interactive - for example, tools for creativity, where you can work with images in real time or synthesize music.

How to become a cool specialist in any field of artificial intelligence (for example, NLP), as quickly as possible?

As for the second part, speed always depends on the existing knowledge base.

As for the first part - it seems to me that the question is posed incorrectly here. NLP used to have a lot of different techniques, you needed to know a lot to solve problems, but then deep learning specialists came there. They came up with BERT with incremental improvements, and now for solving NLP tasks you do not need to know anything except BERT. At the same time, in order to understand BERT, you do not need to understand NLP - you need to know how models are applied to symbol tokens. You need to become a specialist in machine learning, and then you will be available - with little effort - its various application areas.

How to become a cool machine learning specialist?

First of all, you need to build in your head a good conceptual framework of what is happening in machine learning. As long as you perceive it as a set of disparate facts, algorithms, and heuristics, you will not go far.

Roughly speaking, one must understand the following: all we do is search for functions from some set of the best in a sense. You need to understand what are the meanings in which the function is the best, among which sets you can search for which functions, why we prefer one or the other sets, why it is more effective to search in some than in others, what tricks exist for searching in different sets. You need to understand that these functions are data models (at least those that interest us).

Data models are built using standard techniques from a small set, which is approximately the same for deep learning and probabilistic programming; you need to understand how these techniques are combined, and in what cases. Then you will find that you understand how tasks are solved in different subject areas.

Suppose there are Kalman filters - modeling the dynamics of systems in time, and there are books about them that can be read. If the Kalman filter does not suit you, you will not be able to make any modifications to it to do something similar for your task, but "not quite a Kalman filter."

But if you understand that this is just a probabilistic model built on certain, fairly simple principles (wherever something is unknown, add a normal distribution, and everything that is modeled directly is linear dynamics), then you can build something what you need without even knowing about Kalman filters.

If you achieve this way of thinking, you will find that most articles - even from top conferences - are uninteresting. Usually, incremental improvements are described there using standard techniques that you yourself can apply - and this will be obvious to you - and without the possibility of scaling beyond the limits of the dataset used. In any case, you will not miss good articles in which truly new techniques are presented - everyone will talk about them, and you will quickly learn about them. It turns out that there are few really needed articles.

Tell us about the stack you are working with. What libraries and frameworks should a novice machine learning specialist study?

I work mainly with TensorFlow and Keras. PyTorch is still gaining popularity - colleagues praise him.

When Keras comes up - that is, those high-level abstractions that are in it can be used, and there is no need to go deeper - it is better to use Keras, it saves time. Of course, you need to understand how Keras works and how to go beyond if necessary.

If something is missing in Keras itself, you can always add a piece to TensorFlow - the architecture allows this.

How is Yandex autopilot created in stages? What kind of specialists are hired for this, how is the data science / machine learning workflow built?

First, I’ll briefly describe how the “unmanned” stack works - for more details, see Anton Slesarev ’s video report , it’s easy to find. The stack has many components. Perception is a vision of what is happening around the machine at the moment.

Localization is an understanding of where the car is located, using information from sensors and pre-built maps.

Prediction is a prediction of what will happen in the next seconds (that is, how other participants in the movement will behave) with the help of knowledge about how the world is built now and how it was built in the past; I’m just working in this part.

Planning- what comes after perception and prediction: you need to choose a safe sequence of actions that will lead to the solution of the problem.

Control - converting this sequence into instructions for the car (steering wheel, gas-brake).

In many elements of this stack ML is now needed, or is not used in state-of-the-art solutions. There is a lot of ML-engineering work - it is necessary to make this work, and work quickly, because latency in such systems is very critical. We need to learn how to train models, to understand which metrics allow us to understand what became better, which ones do not allow us, to understand how to collect data more efficiently. In addition, there is a huge component of infrastructure work, often underestimated. A very powerful infrastructure is needed to develop all these components together.

UAVs collect a huge amount of data about everything that happens to them - you need to be able to quickly work with this data, answer questions like “what would happen in situation X if there was a change in Y in the code”. This requires non-trivial engineering solutions and good engineers.

The data science / machine learning workflow is, as elsewhere, in my view. Any team should have a metric that needs to be optimized at the moment.

For most people, a typical day goes by in search of what to do to improve this metric. And this metric should be aligned with your goal - of course, it is difficult to come up with it right away, the metric will gradually evolve.

Let's say you make a pedestrian classifier. Found pedestrians around, using average precision, let's say. You optimize the metric and find that it seems to grow from your changes, but in reality it gets worse. You understand that the metric is bad. Come to the conclusion that it’s not necessary to search for all pedestrians - those that are far ahead, or behind 50 meters behind, do not affect us in any way. We need to clarify the metric. You go only to those pedestrians who are nearby. Then you understand that this is also bad: you are only interested in those that are ahead.

This is how the evolution of the metric takes place. At each moment, a certain metric is fixed, and you improve it. This relieves you of the cognitive load: you just think about how to improve one number - and part of the team is constantly working on the optimal choice of the number that needs to be improved.

I am immersed in the theme of “strong AI”. I have two questions: why can’t we learn AI the way we teach our children, and what area of use will the first create a strong AI, if at all?

I understand the first question as follows: it is as if children are taught from simple to complex. Initially, they live in a simplified model of the world, where Santa Claus is, but gradually their world becomes more complicated, and children learn to solve more complex problems. It seems logical that AI should be taught according to a similar principle - there is even such a proposal from Thomas Miklov (he is now on Facebook AI Research) to build a training scheme for strong AI.

In addition, in machine learning, there is a field of curriculum learning - that is, model training on a “from simple to complex” principle. The problem is that now everything works within the same task. The same task of finding dogs - at first the network is taught to distinguish dogs from cats in images where they do not look like at all, and then they take more and more similar ones. This is an iterative-translational method: it is assumed that the network will build simple concepts, and then, on their basis, more complex ones. It does not work when it comes to different concepts.

If you start to teach the system something, having previously taught it to another, it forgets those concepts that were previously remembered. This is a problem of catastrophic forgetting; nobody has solved it yet. Gradient descent changes all weights at once, and this destroys old concepts. We need to figure out how to build new concepts without destroying old ones.

The fields of research one-shot learning and few-shot learning are associated with this: learning concepts for one task and using them to solve another problem with a small number of examples. There have been no fundamental breakthroughs in this area, but they need to be made in order to have some idea of a strong AI.

I see no reason for a strong AI not to appear in the future. In our modern view, the human brain is a machine that performs calculations, albeit according to other principles.

There are no fundamental obstacles to creating a strong AI, but I can’t give an estimate of how much time is left until this moment - it is not known what other steps will be required for this. If you extrapolate, using the speed of progress in overcoming the “white spots” of the past, then you can call a figure like “from 10 to 50 years old” - but it’s still a “finger to the sky”. You can appeal to Moore’s law and calculate when the processors will have enough transistor density to achieve the computing power of the brain - also several decades, and this will also be a “finger to the sky”.

I don’t think that a strong AI - if it is invented - will come from business. Rather, it will be created by someone who, with significant resources, is engaged in basic research in reinforcement learning: from all areas of machine learning, this is the closest to what we want from a strong AI. If DeepMind or OpenAI will exist for several more decades - maybe they will. Or someone who will come in their place.

What architecture is best used to classify (not predict, but classify) time series? LSTM or something else?

In recent years, this trend has been observed: almost everywhere where LSTM was useful, attention works better. The NLP revolution just happened: we replaced recurrent networks with attention, and it got better. For time series, I would advise trying attention too. It all depends on the task, but, in general, this is the most effective way to analyze sequences and aggregate data on them.

Engaged in machine learning, not only for work, but also as an expensive hobby. I’m building a network, it got into 3 GB cards, a little more complicated - it’s gone. Are there any alternatives besides CPU?

Lack of funds for iron, on which you can show results in modern deep learning research, is a problem for enthusiasts, and even for universities.

Google has the Google Collab initiative: it’s such IPython at Google’s facilities, where you can get the power of a top-end video card for 12 or 24 hours, and also run something on their TPU. Video cards are not used by consumers, they have more memory - there are 130 GB, as it seems to me. Hands it unleashes. But, in general, a truly large-scale things an individual user can not afford.

Some companies are trying to create chips specially adapted for deep learning that will perform calculations for neural networks much faster and cheaper than GPUs - maybe consumer solutions with such chips will appear in the coming years.

Why do you predict the behavior of other participants in the movement when developing a drone?

On the road it is necessary. When making a decision, it is necessary to take into account, inter alia, the inertia of the machine: it is impossible to instantly change direction (even if it were possible, the passenger would have to be bad). We need to plan actions so that in the place where we would like to be in a few seconds, there is no one else - for this we need to predict the position and intentions of other participants. The trajectory of the machine should run as far as possible from the other participants - this is necessary for a safe ride.

How does a drone's steering wheel rotate?

I do not do control myself. I can say that there are different cars - some can just give commands to turn the steering wheel. In my opinion, the Prius can.

What do you use - Scrum, Kanban, chaos?

Mayhem organized.

I don’t see the need to rigidly structure the workflow, especially the research one: it’s hard to say how long a particular task will take. We have too much uncertainty, and it is not obvious to me why to introduce an additional structure.

We try to communicate a lot, we try to log all the results of experiments; we have special systems that store experiment data regardless of its scale - what kind of code was used, from which branch it was assembled, with what data it was launched - for the purpose of full reproducibility. We log all the conclusions and discuss them among ourselves, share information, try to make everything as open and transparent as possible.

Is there any experience in using ML in industry - metallurgy, mining, enrichment?

I know that in these areas ML is actively used, but there is no personal experience.

A heartbreaking article recently came out about Alice recommending a video about a murder story to a child. Why is this happening, is it difficult to filter content?

The task of filtering content is, in principle, solvable, and with high accuracy. What exactly happened in that situation, I do not know for sure, but I can reflect.

Suppose the system has affiliate content, and there is an API in which partners must tag this content with tags or in other ways. The system initially works on trust in partners - it is possible that only from time to time there will be content checks. Over time, this will not be enough, you will hang up a simple content system with search for stop words in headings and tags, viewing at least part of the content by moderators.

In every system there will inevitably be a point of failure: sometimes people make mistakes, sometimes partners do not fulfill their obligations. All that can be done is to carry out further iterations and improve the system, and the improvements are usually reactive: if something works well enough, there will usually not be improvements until they become necessary.

Maybe when a strong AI appears, you can ask him to filter all the content with 100% accuracy.

Do you attend international conferences on neural information processing systems and machine learning? What are your impressions of Russian conferences in this area?

I can’t say about the Russians. I go to international sometimes, but I understand less and less why.

“Scientific tourism” is, of course, important and interesting, but the conferences themselves, it seems to me, have ceased to fulfill their function. A huge number of articles are accepted on them, but because of this, it is impossible to organize a normal presentation for each author. For example, in ICML only the best paper had long reports, and all the rest had spotlight reports, less than five minutes.

Moreover, a huge number of works - incremental, with dubious reproducibility, the benefit of them for listeners - no. If the conference has really cool work, you will already be familiar with it, most likely - preprint is being laid out early.

I think the format of the conference should be reinvented — or at least greatly raised the bar on what to take.

What was your motivation to return to Russia?

I left Russia because it was interesting for me to live in new places and learn from new people. It seemed to me that for personal development you need to get to where people know more than me. Actually, this is what happened: in Microsoft Research, I understood a lot about how methodical you need to be, how deeply and well you need to understand what you are doing, how much you need to be aware of what is happening. Well, at some point I became bored, although there were interesting tasks.

I then lived in Cambridge - this is a small city in which little is happening, the social circle in it cannot be compared with Moscow. I thought: now you can live in Moscow, apply the acquired knowledge, then, maybe, go somewhere else. I went to work at Yandex - it seems to be pretty good at applying what I learned.

It seems to me that now in DeepMind and OpenAI they are doing interesting things, I could learn a lot there.

I heard that the drone team prefers to use TensorFlow rather than PyTorch to train inference models. What is the reason for this?

Maybe for historical reasons. I can’t say why TensorFlow is better or worse than PyTorch.

What size should the dataset be? Are 50-60 thousand training examples enough, or are millions required?

Depends on the model used and the task. The dataset should be such as to configure the model parameters and prevent retraining. If you have a trivial linear model, then the dataset can be small. If this is a complex neural network, then 60 thousand is not enough.

Learning complex neural networks from non-trivial things from scratch almost always requires tens or hundreds of millions. The principle of “more data - more quality” has not gone away.

By the way, about the question of how to become an expert in NLP. Now state-of-the-art deep learning is always working with big data. They need to be pre-processed, then - effectively streamed into computational nodes that provide training.

We can say that deep learning is a little monkey work: to succeed, you have to try a lot of things, not being sure of the result of working with each.

Maybe you can develop an intuition for options that will work more likely, although I have not met a person with accurate intuition. In the general case, the most successful of all the teams will be the one that can conduct the most experiments per unit of time.

Most of my work is to eliminate “bottlenecks” in the learning process in an attempt to “disperse” it to the theoretically possible speed. This requires engineering skills, the ability to squeeze performance out of code and hardware.

What distinguishes an expert from an ordinary data scientist is that the expert is usually also a good engineer, able to work quickly, write code, deal with distributed data storage and processing systems, and understand the architecture of computers and networks - in order to calculate “gags”. It is very important to develop these skills in yourself.

With the dominance of big data, those who know how to get training to work quickly on these volumes of data are the most successful in the industry. If deep learning worked on small volumes, I would say that only knowledge on it is needed, but now it is not.

Learn to program well, learn standard computer science algorithms, increase your horizons. By the way, cryptography is useful.

Do you use AutoML analogs for tuning architecture and parameters, or more manual experiments and intuition?

Now - more than the second. Automatic tuning is present at the level of sweep by grid or Bayesian optimization, something more complicated in AutoML has not yet been done. It requires a lot of computing resources - if they are limited, it is better to rely on intuition. But, if you understand that you have come to some random fortune-telling, it is better to entrust it to the process.

What does Alice do, what does Google Assistant not do? What is the size of the team in this direction for Google and Yandex?

I can’t talk about Yandex. Google seems to have hundreds or a thousand people. As for Alice’s advantages, I’m not sure I didn’t follow either Alice’s product development or Assistant features recently.

The question of quality is incorrect, it seems to me. Even if Alice was worse, would it mean that she has no right to exist? Products are created and compete with each other, due to this they all win, evolve, stretch up.

I don’t understand the mentality "a new product is the same as Google." In business, products are created in this way: you take someone else's idea as a basis, implement it - sometimes as-is in general - but this is not the end point of the path, but the beginning. Then you progressively change the idea so that it becomes better than that of a competitor. This is the whole story of progress!

How does sport help achieve more in other areas?

In sports, especially competitive, I like the ambiguity. If you lose, you lose. You can’t blame this on circumstances: you were not good enough, you did something wrong.

In competitive sports, directness, sincerity, and the ability to admit one’s mistakes develop. It helps in other areas - it’s always better to admit that you need to become better in X, Y and Z than to seek excuses. Besides the health benefits of course.

How does parachuting help you get out of your comfort zone?

Imagine that you want to jump out of a plane by a group of people and put something together. The plane enters the combat plane, the door opens, you all line up inside and wait for the signal to jump all together.

At this moment, there should be no doubt what is happening. A delay of even a split second delays everyone. You must bring yourself into a state in which you jump without hesitation. The world disappears, only a leap remains. If something happens with a parachute, you will have little time to take action - then there should also be no doubt and fear, you need to do exactly what you were taught, as quickly as possible.

Parachuting nurtures in a person the ability to make a decision to do something - and not to doubt further. A parallel can be drawn between this and complex projects. Sometimes it is not clear what to do in the project; when you start the task, you don’t know exactly what to do, how to do it, whether it can be done at all. At this moment, it is easy to start doubting and thinking, “what if I can’t?” - Both time and mental effort are spent on this. If you were asked to solve a problem, then they believed in you. You must make every effort to this problem. You need to bring yourself to the same mental state as in a jump, to drop everything unnecessary and concentrate. It became much easier for me to achieve this after I began to engage in parachuting.

How much do you spend on parachuting?

A lot of. This is a significant article in my budget. I perceive this as an incentive for further career growth.

What club are you in?

Mainly in Pushchino, certain disciplines are well developed there - for example, freefly.

Where to go to study as an engineer ML?

My data is already outdated, I studied for a long time - at Moscow State University, at the VMK. I can’t say that it was a super-ML education, but the teachers taught us well and introduced us to the world of ML. I think many people know Dmitry Petrovich Vetrov - I owe him a lot, if not for his lectures and the special course that brought me to, I probably would not have studied ML. I don’t know where he is teaching now, but it’s definitely worth going there. In addition, no matter what faculty you study at, I recommend going to the SHAD if possible. Not because it is Yandex, but because it is a really cool place - there they will give all the knowledge necessary for good practical work in the industry, which may not be in a university. From ShAD, many trained, talented people come to us who know what to do.

Once again about the question of whether it is possible to get into ML “not a techie”. Technical skills are needed, but a liberal arts major is not a blocker. To understand the basics of programming and mathematics, you only need a head on your shoulders and the ability to logical and structural thinking. There are many people who, although they have chosen for some reason a humanitarian specialty, possess such skills. Nothing is impossible, the main thing is to try. It’s necessary not to think about the topic “can I”, but to start doing it? This greatly increases the chances of success.

Is it possible to study at a school of learning at the same time as a university?

I did so, although it was quite difficult. There is a big load in the ShAD. You can combine the SHAD with the latest courses of the university, where the load is relatively small - it will be hard work, but it will pay off.

Motor racing experience with drones?

We have active motor sportsmen on staff, because we need to train drivers and QA engineers who are directly in the car. They must be able to recognize an emergency situation during the tests and respond, they all undergo contra-emergency training from motor sportsmen. If we are talking about whether we use some kind of tire physics model that professional athletes use in motorsport simulations to develop tactics - in my opinion, no, we’ll not cut the hundredths of a second on the circle yet. Telemetry, useful for athletes, is different from what we need, and we have more data measured.

What's next?

The next live broadcast will take place next Monday.
You can ask a question to Natalya Teplukhina - Vue.js core team member, GoogleDevExpret and Senior Frontend Engineer in GitLab.

You can ask her a question in the comments to this post.

Interpretation of an interview with Ilona, Senior Software Engineer on Facebook, can be read here. Insiderings from a Facebook employee: how to get an internship, get an offer and everything about working in a company

How not to join the ranks of aspiring specialists if you are Data Scientist