Release train. Yandex Report

Release processes in different Yandex teams (and in any large IT companies) are arranged in a similar way, but differ in many details. Mobile developers have their own specifics: their releases are affected by the layout order in the App Store and Google Play. Android developer Dmitry PolyakovDmpolyakov He talked about the processes around him - how his team sends a release train on a schedule, how to launch unscheduled releases, add trailers to an already left release, and what to do to stay on track.


- Hello everyone, Iā€™m Dmitry Polyakov, Android-developer of the mobile application I take.



Now in two teams - Android and iOS-development I take - 13 people each. This allows us to do many cool tasks in parallel and quickly roll them to users. In the report I will tell how we learned to work with git and live in one repository.

Next Iā€™ll tell you how our release cycle works. Managers come to us and say that we want to roll out this feature as soon as possible. So, probably, any manager wants, so we set up the release process so that we go to the App Store and Google Play every week. I will also talk about the tools that we have and which I would recommend you try, they are cool.

Git flow


Let's start with Git Flow. As a basis, we took the classic Git Flow, however, it has changed a lot under our conditions, under our team. You look too, and maybe something from this will suit you, but something not. Each team has its own approach to working with git.

How does it work with us? Epic is the root branch for your feature, for some great functionality. To make it more clear, let's look immediately at the product example.



The manager comes and says - the developer, make us the functionality of the wish list with selected products in the application. The developer starts a new branch epic and calls it wishlist.



Further, he decomposes it into smaller tasks, which starts in the tracker. Perhaps this is working with the network, rendering the UI, writing tests. For each such feature, he starts a task in the tracker. And as soon as he starts to complete the task, he starts the corresponding feature-branch. Brunches her from the same epic.

As soon as he finishes work on one such feature, he pours it into the epic through the pool request. Pool request is such a mechanism when other developers from your team check your code. If they donā€™t like something, then your code may not reach your epic, which means that you wonā€™t be released until you find agreement with them.

There are two of these reviewers in our team. They are assigned randomly. This is an automated process, it selects from those developers who recently worked with the same files that you changed in your pool request.



Thus, it turns out that several epics with their own set of features live in the project in parallel. An epic can have just one feature. This can happen if we want to upload this feature through the pool request to epic.



As soon as work on all the features within one epic is completed, the task first leaves to the testing team, the QA team, and they functionally test the branch. Once they find bugs, you edit them as part of this epic. Once all the bugs are fixed, you fill this epic into develop. Here, no additional code review is already carried out by your team, because all the code has already been looked at during the stage of feature freezing in epic.

Our develop is such a branch into which the already tested code visits, because at the epic stage it is tested and the code went through the pool request.



This allows us to safely create a new release branch at the beginning of the release cycle. Without fear that there will be many bugs that have not yet been tested. Therefore, we are creating a new release branch from develop, it is being tested. As soon as the release is tested and we are ready to leave further to the gate, this branch merges into master.

Sometimes we have hotfix, and there is a separate branch type for it. This is a very short release that needs to be rolled out quickly. We donā€™t have time to wait three or four days for the next release cycle to begin. Usually this is something small.

For example, if some kind of bug got into production and is very critical, we need to fix it urgently. Therefore, we stop the current release, run hotfix on this bug and update our release. But hotfix is ā€‹ā€‹not always a story about bugs. Sometimes we have a product need.



For example, as soon as the self-isolation mode was introduced in Moscow, we had a product need so that users could order products without contact. Now, when placing an order in our application, the user can select the ā€œLeave at the doorā€ function. The courier arrives, leaves the parcel under the door and hands it to you without contact.

With this task in hotfix we also included such different widgets that urge you to stay home and order goods by courier. Do not go to pickup points now. We rolled out these tasks through hotfix so that we can convey them to the user as soon as possible, because we consider them important.

When we have hotfix, we brunch it from master. From develop it cannot be brunched, because at that moment new could develop epic, which had not yet been released. But we do not want to take these epic with us to hotfix, so that they would not randomly affect us and block our hotfix. After hotfix completes, we inject it into master and also add it to develop, as this code did not exist in develop yet.



Master is a code base corresponding to the latest version of the application, which is now in the user's store. It is clean, without bugs, because functional and regression testing have already passed there, this is a backup branch.

When our release is completed, we also pour it back into develop. Because as part of the release, bugs can be found that were not found in functional testing, and different epic can conflict with each other. Therefore, we also inject the release into develop so that we have these fixes in develop too.



A lot of work can be done on epic, and in order to keep up with the code in develop, the developer sometimes adds it to his epic, so that there are fewer freeze conflicts.



Since we added develop to epic, epic for the same reasons needs to be added to your feature.



In the names of epic and feature branches, we have a ticket number in the task tracker. This is cool, because itā€™s precisely by the ticket number that we can find out completely from any part of our application, within which tickets it changed, who wrote this code and for what purpose it was written.



The release and hotfix branch has in its name the current release version number of the application. Some teams combine the hotfix number with the release number, justifying this with the fact that hotfix is ā€‹ā€‹something small and perhaps not very important for the user. Therefore, they do not increase the version of the application. We do not use this approach, because various crash reports from manufacturers come to us, and we want to understand exactly whether this report was in hotfix or in the release, in order to know where to look for the problem.



Master and develop are branches that constantly live in our repository. Once created, and live. Therefore, they are named so succinctly.

That's the way we live. Now we are comfortable, convenient.

Release train


We pass to our release processes. But before we talk about releases and how we build them, I will talk about the roles that we have to support the release.

We have an on-call developer who within one working week ceases to deal with product tasks and fixes bugs from the current release. If he has no tasks for the current release, he will fix some technical debt that we have accumulated, take tasks from the backlog and correct them.

There is also a tester on duty. He also stops testing product tasks within one working week and checks the current release. If he does not have tasks for the current release, he tests what has been corrected as part of the technical debt.



The release starts on Friday. It is on this day that we have a hard deadline. At 18 oā€™clock in the evening, the tester on duty clicks on the ā€œStart Releaseā€ button. After this moment, everything that will be poured into develop will no longer fall into the current release, because after clicking on the button a release branch has been created, develop has merged and more will not be poured there.

Another important process is taking place on Friday, another item on the current release, which I will discuss later.



We have a rest on weekends, so the second release day is Monday. The on-duty developer begins the day with an analysis. He is looking at what has been changed in the current release in terms of code. Takes git diff between the current release branch and master. And by this diff he looks at which components have been affected. It may affect the checkout process or basket, and work with pedestals is not affected.

Thus, he forms a list of various cases that will be tested during testing. This helps us speed up our regression, check not the entire application. When the list is compiled, the testing team checks the application, and the developer on duty corrects the bugs. If he has a lot of bugs, he can delegate part to other developers who have recently worked with these pieces of code.



On Tuesday, we continue to test our release and fix release bugs. Toward the afternoon, our regression testing ends and we are ready to leave for the gate. We launch our release train - in fact, even several release trains, because we recently entered a new marketplace for us. I also recommend that you try to publish not only on Google Play, but also in some others. The plus is not only that you get a new loyal audience.

Somehow, we published our release and after a few hours found that the number of bugs among users has grown significantly. We looked at these bugs, analyzed them and saw that they only occur on Huawei devices. We did not immediately understand what was happening, but we had Huawei, we tested them, found a bug, fixed it and went to update.

As soon as we arrived with the update on Google Play, we saw a large banner, which said that due to the current situation in the world, Google Play has a very large load and they do not have time to check applications as quickly as usual. It turned out that we did not have time to check our application yet, we did not reach Google Play users, but were published only in the Huawei AppGallery. And that was the reason why we had bugs only on Huawei. Thus, it was possible to detect and fix a critical bug even before publishing on Google Play.

Next, Iā€™ll tell you how publications are arranged through Google Play, because we have a very large share of users there. And at Huawei AppGallery, we recently left and are still trying to understand how everything is arranged there.

We do not immediately publish to all users on Google Play, so that some kind of random bug does not affect our entire audience. We publish only to all testers who subscribe to the fact that they may have a bug, but they will be the first to receive our changes and releases. In addition, we publish only five percent of the audience.



On Wednesday, the developer on duty is watching a crash-free new release. It is important for us that there are no new crash and that there are not very many old ones. If everything is normal, he still checks the product metrics. For example, so that the number of orders does not fall compared to the same period. If our product metrics and crash-free are good, we are rolling out another 5%, a total of 10%.



On Thursday, the developer on duty checks the reviews in the store. In fact, he watches them on Wednesday. True, on Wednesday we still have a small audience, one or two reviews. But on Thursday there are more reviews to judge the release. There may be 10-15 pieces.

Why does he even look at reviews if we have a lot of metrics and graphs? The application may not crash, even metrics may be in order. But itā€™s possible that the user has fonts or maybe some filter doesnā€™t work for him. We try to make the use of the application for the user as convenient as possible and analyze such reviews, correct bugs or problems that the user encounters.

If the reviews are in order, crash-free is also normal and the product metrics have not sagged, we are already rolling out by 20%.



And so it begins Friday, the launch day of our release. The third point that I wanted to talk about is that on Friday we will complete the current release. We roll it from 20% immediately to 100%. It seems to be a very big leap and very risky. But it depends on the team and your audience.

20% of our audience allows us with a high probability to judge the stability of the release. And if everything is good by 20%, and on Friday we didnā€™t see any problems, then weā€™ll go straight to the hundredth.

I know the teams that use life hack on Google Play - maybe he will help you too. You can roll out not at 100%, but at 99.9%. This will leave you with a button on Google Play to urgently stop the release. And if you roll out 100%, this button disappears. But, as I said, by twenty percent of our audience we can accurately judge the stability of the release. Therefore, we calmly roll out to 100%, this saves us from additional steps. And then you need to roll another 0.01%.

This is our process, so we ride every week and try not to get lost.


What other tools do we have to support the good life of the user on his side? These are Force Update, Soft Update and Feature Toggle.



Force Update - a mechanism that blocks the use of the application if its version is very outdated. The version that is considered obsolete is set in the admin panel on the server. And as soon as the number has been changed there, some applications will have such a box that will not let you go. There will only be a Refresh button, the user will be forced to upgrade.

We try to use this mechanism as little as possible, but sometimes it is very important. For example, if we broke backward compatibility, rolled out a new feature that is not supported in the old code. Then the user of the outdated version of the application may end up in a non-consistent state. He will go to the basket, and for example, he will not have an order. He will not understand why, although in the new version all errors are registered and it will be clear why it is not possible to place an order.



There is Soft Update to help Force Update. This is a native thing from Google, which is simply embedded in your application and does not block use. But she says - there is an update on Google Play, install it and you will have new cool stuff.

Initially, it is implemented by such a dialogue. This is a native design from Android. And then it can be embedded in your application. For example, we implemented it in our widget ā€œUpdate the applicationā€ in our color scheme.

Soft Update allowed us to greatly reduce the tail of versions, and it is implemented simply according to the documentation. Try it if you have many releases.



Another important tool is Feature Toggle. It allows you to adjust part of the functionality on the user side, changing it in the admin panel. There is a set of features that we can turn on and off from our server without additional application updates.

Let's talk about how Feature Toggle works on the example of a third-party application - a vehicle. Initially, developers have just such a bike, which already has two Feature Toggle: a motor and a large size. Customers use the bike, then the testing team says: we tested the motor, it works, drive, cool, let's turn it on for the user.



We go to the admin panel, turn on Feature Toggle and the bicycle without updating the user, on the go it turns into a moped. The user is comfortable, this allows him to move faster and more convenient.

The product is developing, the audience is growing, it no longer fits on one moped. Users want to take their family with them, ride together. Developers and managers have provided an additional Feature Toggle - a large size. We enable it in the admin panel, and the user's vehicle becomes larger on the go.



It seems cool: Feature Toggle helps us. This is true, but there are problems that you may encounter. For example, you need to monitor backward compatibility and Feature Toggle compatibility. Suppose, at some point, the application breaks the motor, a crash or bug occurs. Or this motor will eat a lot of our resources, and we will not be able to support too many users with the motor. Then we have to turn it off.

But we do not want the userā€™s application to disappear. We want to give him the opportunity to use the application, despite the fact that he still has a Toggle with a large size. Therefore, when we turn off the motor, we must have a fallback mechanism to control the vehicle. In this case, this is such a hybrid.



Perhaps it was worth considering that the roof recline. Perhaps the driver will be uncomfortable to sit in such a seat. But he will still be able to drive a vehicle, use the application, and not walk.

How else do we use Feature Toggle? Suppose backends are still being developed and are not ready for release. Then we can develop part of the functionality, support the API contract for all communications with the backend, support the UI and roll out with the Feature Toggle turned off. As soon as the backends are ready, we will test that everything works well, and if so, enable Feature Toggle. Then the user will not need to be updated to get new features. That is, we will already have an audience, we will immediately appear in this audience. So great too.



Now, as I have already said, we have 13 people in each of the Android and iOS development teams. We work on one Git Flow, in one repository, set up our planned release process, reduced the time to market and ride every week. We recently released Huawei AppGallery, and look at other stores. We learned how to change user applications without updates due to Feature Toggle. Thank you for the attention.

All Articles