Yandex.Practicum - Data Analyst. Graduation

The first article is here.

The training at Yandex.Practicum has ended, a certificate has been received and you can summarize the training.

Also, after the first article, many had additional questions, so I wanted to answer them and show a little practice. Cases have been mastered quite a lot, so in the framework of one article, it will not work to cover everything.

First of all, I want to describe what happened in the training after writing the first article. What I would like to describe separately.


The “Automation” course turned out to be the most difficult for me personally - on the automation of data analysis processes (scripts, dashboards, etc.), the quality of the training material turned out to be completely irrelevant.

These were purely technical failures from the category - “I, something clicked and everything went out” :)
(incompatibility of software versions, problems with the equipment), caused a violation of the deadlines for the work. It also turned out that I had practically no experience with the command line, but I had to urgently learn ...

As part of this topic, we gained experience working on a virtual machine in Yandex.Cloud :

, . , , , . , (CLI), API SDK.


I was impressed by the course with the alluring title “Forecasts and Predictions” (machine learning). It turned out to be very important, the analyst must have an understanding of machine learning, although this is more relevant to Data Science. I’ll say right away that I liked the idea of ​​introducing the obtained analytical conclusions immediately in practice, since I like the full cycle of work and the less the separation of processes, the better the result is (there are, however, some difficulties in that).

The course consists of 3 large blocks:

  • machine learning tasks in business,
  • machine learning algorithms,
  • the process of solving machine learning problems.


The graduation project was held in Yandex.Tracker - a task and process management system so that students immersed themselves in the work process, as in a real company.

Each student carried out his project and sent reports to the Tracker, unexpected tasks also arrived. It was an interesting experience, but it was difficult to evaluate the time deadline in real companies (for how long usually a project is done in life).

And the last peer review job at Peergrade is an online platform for conducting student feedback sessions. There we evaluated one of the tasks of each other on the project.


really liked the job placement program. You can be a good specialist, but do not understand at all what needs to be done in order to present yourself correctly and adequately. It seemed to me that having the portfolio in hand, with the finished work, the employer would look at everything, we would talk, and the process would be shortened for everyone, but it turned out that nobody was looking at the projects. In most cases, it all starts with the HR department and therefore you should have a normal resume and cover letter, and many other subtleties. Therefore, unexpectedly for me, this program was extremely useful.


You will be a ready-made specialist at the exit if you have experience in a certain field where you can not only apply the tools you have learned, but also be able to interpret the result, and ideally, also implement it.

Yandex.Practicum will give you only tools for analysis, and you can really learn tools from scratch (for example, after graduating from school), but it is unlikely to interpret the result, for this there is a specialized education or work experience in a certain field.

In our country, the Workshop is a bit ahead of schedule, as it turned out that for so many vacancies you will need Excel perfectly :). Apparently, employers are having difficulty moving to other data tools.

Let me remind you that our flow was the first, and I understood that there would be any technical difficulties and the developers of the course would also learn to some extent from us.

The main disadvantage for me was the "human factor". Later, analyzing my completed projects, I discovered several errors that teachers should have pointed out to me. And in general, it was felt that the teachers did not have enough time to check, I attribute all this to a new product and we completely solve this issue. Moreover, the guys doing the course are trying very hard to make a super product, for example, the topic “Forecasts and Predictions” has been completely updated and has become much more understandable and complete. I go through it again.

There were also contradictions in the recommendations on the application of certain methods from different teachers, different points of view.

Learning Tools

( what is better to have an idea before the start of classes in order to save time, especially if you work in parallel ):

  • Python , it’s better if you have an idea of ​​the language before starting classes. There is an introductory course, but other introductory courses would not hurt either;
  • Jupyter Notebook , also better to read before classes;
  • SQL is required almost everywhere, everything you need to get started was definitely given, now it is a matter of practice;
  • statistical analysis , I strongly recommend that you start the “ Fundamentals of Statistics ” on Stepik with Anatoly Karpov before starting ,

    , « » . « » « . 2 3.» .


  • - ( , , -, );
  • , (/-);
  • , (, , );
  • ( , , , , , );
  • machine learning, sklearn (pre-processing, model building, classification, choosing the best model), but still this is a rather short course, and those who want to work in this area will need a more advanced course, for example from Yandex

And also, if it was a long time ago or you do not know anything about probability theory at all, look at least the lessons from GetAClass first on combinatorics , then on probability theory .

Needless to say English.

In the second part of the article, I will show the practical application of the knowledge gained for research analysis: an advertising campaign in Yandex.Direct, visits to the site, and identification of a possible fraud. On data collected over 6.5 years.

All Articles