We share the largest in Russia layer of data on online training with projects in linguistics, personalization, peddesign, ML

Before the New Year, Michael's team VerdloveSverdlova announced that she was ready to share anonymous data from Skyeng lessons with external researchers and startups. Soon after the holidays, we talked with Misha about what kind of data are in question, what they are doing with them and why you can get your data set only by writing to him by mail.



- If you share data, then why not just upload the dataset somewhere?
The largest body of English in Russia, in my opinion, is 10 thousand positions. By the end of January, over 9.1 million lessons were globally held in our school - as far as I know, only Chinese schools have a large set of online education and one-on-one lessons.

We know what happened and how the actions of the teacher and student changed for all the lessons that we spent, we have a track of the history of all the exercises on them. This is about 120 metrics for teachers, as well as about 300 parameters for children of two age groups (4-11 and 11-18 years) and adults of different ages, cities, statuses (for example, students) and so on. And this is definitely not all the parameters that we can collect - it seems you can use 2-3 times more. On such a volume, the story “here is a link to the dataset, twist whoever wants it” will hardly work.

- Who are you ready to upload datasets on request?
The first type of likely partners is scientists and organizations that do basic research, write articles, and so on. Usually they need a base for research - we are ready to become one.

Now, for example, we are discussing joint neurophysiological studies with one of the largest universities in the country, as well as partners from Cambridge and Arizona.


To begin with, we want to take the current content, mark it in a certain way - and neurophysiologists will run a test for people who come to them and will be trained in such “earflaps”. We will understand how the materials work for the target audience, what psychological and neurological features are, and then we can train the model on retrodata to change the content and formats, making them as convenient as possible for the student.

In parallel, these same metrics of focusing and assimilation of the material we are now analyzing with one of the startups on audio and video streams.

- What benefits will each party receive?

1. We initially do everything for free for each other.

2. The results of the study belong to both parties - it does not matter if the experiment has become successful or failed, we can write joint articles, be a reference base, etc.

3. If the result of the study is positive, the partner can commercialize it, and we can use it for our needs.


We are also ready to show the end results to our partner’s customers in the field of education. But immediately we discuss the issue of non-competition - this is a basic condition. For example, they immediately stipulated with neurophysiologists that if it comes to a product, they should not sell it to our direct competitors for a year or two. Roughly, you can sell it to the Chinese military immediately, and someday later to another online English school. This is a wording not yet perfected by lawyers, but it looks like this.

- Ok, what if a commercial project comes to you?
Companies that are introducing ML-tools in education or even other areas, we are also interested. These can be tools for personalizing educational trajectories, synthesis or analysis of speech, motivation in the learning process, psychology, and so on.

We already work with such ones - for example, we define a number of pilots to match the teacher and student.


Therefore, startups that work to speed up the process of acquiring new knowledge, mechanics and methods of quick long and deep memorization, are busy with recommendation systems, and so on - come too. Again, it will be necessary to discuss the issue of non-competition.

- Ok, and how does it all look for a partner?
Write to me at data@skyeng.ru : tell us about your competencies and desired topics, discuss this. When we have agreed on everything, we sign an agreement and go create history in education.

There will be several guys on our side - projects that will pick you up and help you receive all the unloadings in a timely manner, start experiments, and so on. The partner gives us the algorithms and / or logic, we start the mechanical analysis and give the partner an already aggregated text file with a description. Actually data - images, video, audio are not transferred to the partner.

And the rest ... Just write, do not be shy, - or ask questions in the comments, I will try to answer to the maximum.

All Articles