Ⓜ️ 🤰🏾 🏭 The work of a distributed team in conditions of self-isolation: as we almost did not notice the difference 👨🏽‍💻 ♓️ 🏡

Self-isolation mode forced many to work from home. It’s easier for someone, a more difficult one, and someone wouldn’t notice the difference at all, but after the announcement of the quarantine week (and then the month), the increase in lifehack posts, efficiency and productivity in the feed has increased significantly.

My name is Mikhail Troshev, I head the Yandex Search Interface Service. Our team has been working on a distributed basis for many years - I’ll tell you below how it differs, and how it is similar to “remotely”, how it is organized, why it doesn’t break, and how our experience can be useful to those who were suddenly taken abruptly by a change in the operating mode.

Something will surely seem banal to you (Agile, Scrum, Kanban, DevOps - wow discoveries!), But it’s like charging in the morning: everyone knows that it is useful, but for some reason laziness is done regularly and in full force. So: we do. And it works.

Not remotely, but distributed

What it looks like: 90 front-end vendors gather every day in the offices of Moscow, St. Petersburg, Kazan, Innopolis, Yekaterinburg, Simferopol and Minsk - it is easy to notice that we are separated not only by distances, but also by time zones. However, this is not all: front-end distributors are distributed between product teams (virtual teams) of about three to seven + back-end, designers, testers and managers (in 2019 he spoke in detail about our work structures here ). That is, almost all members of one such team are in different cities. Not quite remote, but very close to this: although several colleagues are still nearby, the possibilities of microsynchronization with the others are significantly limited compared to working in open space.

It is necessary to use asynchronous communication with a long reaction delay. The stronger the team is dispersed between different offices, the more in the daily working interaction the waiting overhead that any project can bury. To save time:

- the life cycle of each task from idea to production is arranged as consistently as possible: procedures, statuses, transition between stages - everything that is possible is formalized (approximately 90% of tasks). At the same time, we try to keep the bureaucracy simple and understandable, otherwise it ceases to be useful and begins to interfere;

- the whole team is aware of the rules governing the work process, and is trying to clearly follow them; We try to automate the routine. Thus, everyone is busy with his own business and does not waste time making the same type of micro-decisions: programmers - program, product managers come up with product features, designers - designers.

Nothing new, right? However, this simple approach helped us a lot in the conditions of self-isolation: each employee can bring maximum benefit on their own, as they are aware of the details sufficient for the work.

But first things first.

Stage 0. Planning

Each team has a backlog, the top of which is rated and prioritized:

The fundamental point is that each team member must understand: this task needs to be done because it is tied to the epic, the epic is tied to the goal, and the goal is important. Otherwise, either people may start to do what is not needed, or the manager will be forced to constantly monitor and extinguish the fires. The latter, by the way, can be difficult to notice in the office, but it is impossible to miss it remotely: there is a lot of confusion, work is in progress, a dozen chat rooms start in the messenger in attempts to synchronize - as a result, the whole team is chatting, instead of writing code. It is critically important to organize planning in a team so that each participant in the process understands what and in what order he needs to do. Then the team leader will be able to focus on the product without being distracted by constant small questions.

Of course, it’s impossible to detail the entire backlog to the bottom, but a high-quality study of its top allows you to quickly recruit new tasks in the sprint - almost all teams live in two-week sprints - and maintain a reserve for the next. With this approach, an experienced team can type a sprint even without a manager, if it suddenly turned out to be inaccessible.

In order not to stretch the time of work on the task, the team is engaged in a “useful bureaucracy”: the manager formulates a clear description, the performers put down the correct statuses, the tasks “flow” from left to right on the sprint board:

Here the whole team sees the full and current picture of the sprint. If someone gets sick, goes on duty or on vacation, other participants immediately pick up tasks in their current status.

Well, how can it be without daily syncs, when the formal organization of the process is not enough to resolve some particular cases, and the bureaucracy is more likely to interfere with the process. Usually detailing tasks by statuses (open> in progress> in review> ready for test> testing> tested> ready for dev> dev> rc> closed) is enough, but in 10% of cases you need to clarify something, say it in words, explain “ on fingers". By the way, I am convinced that all working meetings (including stand-ups) must be carried out using video communication, and not just by voice, because it forces one to put oneself in order and tune in to work.

It is very important that the whole team was present at the sync: they quickly checked the context, sorted out the tasks and began to work, guided by the board, without wasting more time on questions to each other. Of course, we also have working chats, but we try not to flood them and use them to get a quick (ideally binary) answer: is it possible to roll the release, where to find instructions, with whom to discuss the data source API.

Where time won: synchronization, micro-decision making

Stage 1. Development: open> in progress

“Open” - the task is waiting for the performer. Developers pick them up so that each is ready as quickly as possible. For example, it happens that it’s Friday in the yard, and the developer goes on duty next week (and this is known in advance, for a month). In this case, it is better for him to take a small task in order to manage to do it in one day: we try not to transfer what was started. If someone does not have time to finish the work, it is better to pour in as is, and then to finish off the remains with a separate pool request.

I deployed a local working copy of the project - do not forget to transfer the task from “open” to “in progress” so that someone else does not take on it. “Local”, by the way, is a keyword - you can work from anywhere, the quality of your Internet connection will not be a blocking factor. Now that the infrastructure and networks are overloaded, it’s very important. Our local development server allows you to use data dumps - zip archives with data for various requests, so that you can fully work without the Internet at all. After the developer has finished the work and sent the pool request, automation is turned on.

Where time gained: an independent assessment of strengths and capabilities, it is not always necessary to overcome an unstable Internet connection

Stage 2. Automatic checks: in progress> in review

Before the task goes into the review, it passes checks on the quality of the code, performance, lack of visual and functional bugs. Here one could write a lot about the completeness and variety of our automatic checks, but, firstly, it draws on several separate stories, and secondly, it goes far beyond the topic discussed. I will just give links to the description of the tools:

- standard set of tools for static analysis: ESLint and Stylelint with a rich set of plug-ins;
- our own static checks: the availability and quality of translations, validation of yaml files;
- standard tools for unit testing: Mocha , Karma , PhantomJS ,istanbul ;
our own functional and visual testing tool, Hermione ;
- Pulse - for performance testing - is also our own. They mentioned him here .

By the way, the task will be rolled back here in case of problems at any stage - unfortunately, even the most careful planning does not save 100% from errors, something may go wrong. If he finds the right person in the task, he describes the essence of the problem, uploads screenshots or even videos, he can additionally write commands to the chat so that everyone will notice that the task has stalled. Whatever it is - the bug got out during testing, the design changed, there was not enough data from the backend.

: — , - —

3. -: in review > ready for test

Firstly, I want to call in a pool request not just a free reviewer, but a person who understands what is happening in your code; secondly, from an adjacent team, so that the entire service has an idea of who is doing what - all this is spelled out in our regulations. Even in a small office it can be difficult and time-consuming to organize it manually, to say nothing of the distribution (and self-isolation!). An automated code reviewer comes to the rescue: she also changes statuses on her own. I will not dwell in detail on how it works, I will not - it is better to listen to the developer's story . She also knows how to ping performers: the longer the task hangs in the review, the more violently it pings.

Where you won the time: search for the relevant reviewer, reminders to process the pool request

Stage 4. Testing: testing> tested

When the change has passed the code review, run autotests. Only after they all turn green, the task is transferred to manual testing. It is important that for the task, until it is clearly passed on to someone else, the performer is always responsible - it is he who should quickly drag his code through the code review and testing to production. That is, we always know who is extreme, everyone constantly monitors not only their tasks, but also everything that happens in the team. The tester in the team is usually alone, and this is also convenient for him: he sees that he will fly next, he can use his time more efficiently.

The tester can:

- give the task of testing to assessors - a post with all the details. As a result of testing by assessors, something may require research testing or rechecking;
- Test the task yourself on a live device or using a farm of devices with remote access - Collective Farm.

Live devices are located in Hypercubes in every Yandex office. Any employee can take the device he needs with the necessary characteristics, after selecting the nearest point with a free device. Normally, after a while, the system will automatically remind you that it is time to return the device, but this function was disabled for the time of self-isolation. The duty team made sure that those who needed it critically needed the living devices, and that everyone else, in order not to slow down the work processes, helped with connecting to the collective farm.

A collective farm is a remote testing device farm that anyone can use. Android, iOS - we check changes even on the oldest, most difficult versions of systems to support, so that our services are available to anyone. But we are also trying to get flagships as soon as possible both at the Collective Farm and Hypercubes. Remember the “curtain” (it is “monobrow”) that appeared on the ninth iPhone, and all the problems associated with it?

Where time gained: planning, auto tests, device distribution: available even with self-isolation

Stage 5. Infusion: ready for fev> dev

The task is tested - it's time to pour it into a common branch.

N.B. , master trunk, dev. . trunk.

When there are a lot of injections (for example, we have more than 30 infused pool requests per day), two problems arise:

- minor : the history in git, if you join in randomly and do not rebase, it becomes very confusing, and if there is some kind of bug , rolling back is very difficult;

- critical : integration testing. It is not always physically possible to wait until the end of testing your changes together with the latest version of trunk, so the following may happen: two pool requests, each of which individually does not break anything, will break the trunk after the infusion. To prevent this, we adhere to Trunk based development, that is, a release can be rolled out from any commit. And although we have not yet finally come to Continuous Deployment, our trunk is “green”. And breaking it even once a week is categorically unacceptable to us.

For several years now we have been using the Merge Queue tool to automate the queue and are constantly improving it. Changes are poured not by the developer, but by the robot. In each pool request, it rebases according to the latest version of trunk and runs a full set of tests. This is a rather lengthy process, therefore it is impossible to build it on living people - a person simply will not wait for the final results. And the robot works without sleep, rest and days off. Moreover, the task may not be put in the queue by the developer himself, but by the tester immediately after the end of testing, this once again allows you to save time.

More detailsAbout Merge Queue.

Where time won: we prevent blocking breakdowns, you do not need to manually control the infusion of the pool request

Stage 6. Release: rc> closed

Every working day at 5 a.m. from the last commit in trunk, we automatically get a new release: assembly of static and dynamic packages, laying out on prestable, testing by assessors. Next, the on-duty tester reviews the report and, if there are bugs, reports the on-call developer about them. And if all is well, passes it to the duty manager. If he gives the go-ahead, the developer on duty rolls out the release.

It is important to clarify that the tasks of different teams fall into the release, therefore it is beneficial for everyone when one on-call developer is clearly allocated for the releases, who has been doing all week work on rolling releases and monitoring error monitoring. This allows other project developers to switch to new tasks as early as possible (actually immediately after sending to Merge Queue).

Usually, all release activities occur during business hours (even from home we ask everyone to observe the regime - as a result, everyone breaks it in different directions, but we don’t stop trying), but if something is wrong in production, the shift on duty will wake the developer on duty, that he promptly responded.

I remind you that the main task is to reduce the time of synchronization of team members with each other and the amount of routine: everyone knows who is on duty. Everyone knows what to do and how, everyone has instructions. When the manager allows the release to be rolled out, he will not explain to the developer the procedure, the developer knows everything and knows how.

Where time won: synchronization, micro-decision making.
The cycle closed.

Distributed work is a vaccination against bad, inefficient processes that inhibit projects even in the usual conditions, to say nothing about non-standard ones. The experience of our team confirms that if, with due tedium, you go over to setting up procedures and interactions and honestly abide by all the rules, no matter how commonplace they may seem, the workflow will be quite difficult to paralyze.

Something reminiscent of traffic management: when there were few cars and they were slow, few people thought about traffic rules. Now there are a lot of cars and they are fast - movement without rules has become impossible. The better (and at the same time simpler) these rules are formulated, the more thoroughly the traffic participants fulfill them, the higher the traffic capacity of the roads.

Thank you for reading to the end. See you in the comments!

The work of a distributed team in conditions of self-isolation: as we almost did not notice the difference