👾 🈂️ 👨‍💼 Scaling Android Testing at Odnoklassniki 🤷 🖖🏻 ♥️

Hello! My name is Roman Ivanitsky, I work in the Odnoklassniki testing automation team. OK is a huge service with over 70 million users. If we talk about mobile devices, the majority uses OK.RU on smartphones running Android. For this reason, we are very serious about testing our Android application. In this article I will tell the story of the development of automated testing in our company.

2012, Odnoklassniki, the company is experiencing an active increase in the number of users and an increase in the number of user features. In order to satisfy the business objectives, it was necessary to shorten the release cycle, but this was hampered by the fact that all the functionalities were tested manually. The solution to this problem came by itself - we need autotests. Thus, in 2012 in Odnoklassniki a test automation team appeared, and the first step was to start writing tests.

A bit of history

The first autotests in Odnoklassniki were written in Selenium, for their launch they raised Jenkins, Selenium Grid with Selenium Hub and a set of Selenium Node.

Quick solution, quick start, quick profit - perfect.

Over time, the number of tests increased, and auxiliary services appeared - for example, launch services, report service, test data service. By the end of 2014, we had a thousand tests that ran in about fifteen to twenty minutes. This did not suit us, since it was clear that the number of tests would increase, and with it the time taken to run them would increase.

At that time, the automated testing infrastructure looked like this:

However, with an amount of Selenium Node greater than or equal to 200, the Hub could not cope with the load. Now this problem has already been studied, and that is why tools such as Zalenium or everyone's favorite Selenoid appeared. But in 2014 there was no standard solution, so we decided to make our own.

Defined the minimum requirements that the service must meet:

Scalability. We do not want to depend on the limitations of the Selenium Hub.
Stability. In 2014, the Selenium Hub was not famous for stable operation.
Fault tolerance. We need the ability to continue the testing process in the event of a failure of the data center or any of the servers.

Thus, our solution for scaling Selenium Grid appeared, consisting of a coordinator and Node-managers, outwardly very similar to the standard Selenium Grid, but with its own features. These features will be discussed further.

Coordinator

In fact, it is a resource broker (resources are understood as browsers). It has an external API through which tests send requests for resources. These queries are saved to the database as tasks to run. The coordinator knows everything about the configuration of our cluster - which Node managers exist, what types of resources these Node managers can provide, the total number of resources, how many resources are currently involved in tasks. At the same time, he monitors resources - activity, stability, and in which case notifies those responsible.

A feature of the coordinator is that it integrates all Node managers into so-called farms.

This is what the farm looks like. More than half of the resources are used, and all nodes online:

You can also display nodes offline or enter them into rotation by a certain percentage, this is required if it becomes necessary to reduce the load on a particular node.

Each farm can be combined with others in a logical unit, which we call a service. At the same time, one farm can be included in several different services. Firstly, it makes it possible to set limits and prioritize the resources used by each specific service. Secondly, it allows you to easily manage the configuration - we have the ability to add the number of Node managers in the service on the fly, or vice versa to remove them from the farm in order to be able to interact with these Node managers, for example, configure or update, etc. .

The coordinator’s API is pretty simple: it’s possible to request the service for the current amount of resources used, get its limit and start or stop some resource.

Node Manager

This is a service that can do two things well — receive tasks from the coordinator and launch some resources on demand. By default, it is designed so that each launch of the resource is isolated, that is, none of the previous launches can affect the launch of subsequent tests. As a response, the coordinator uses a bunch of host and a set of raised ports. For example, the host on which the Selenium server was launched, and its port.

On the host, it looks like this: the Node manager service is running, and it manages the entire resource life cycle. He picks up browsers, completes them, makes sure that they are not forgotten to close. To guarantee isolation from each other, all this happens on behalf of the service user.

Interaction

The test interacts with the infrastructure described above as follows: it addresses the coordinator with a request for the necessary resources, the coordinator saves this task as requiring execution.

The Node manager, in turn, turns to the task coordinator. Having received the task, he starts the resource. After that, he sends the result of the launch to the coordinator, failed starts are also reported to the coordinator. The test receives the result of the resource request and, if successful, starts working with the resource directly.

The advantages of this approach are to reduce the load on the coordinator by gaining the ability to work with the resource directly. Cons - the need to implement the logic of interaction with the coordinator within the test frameworks, but for us this is acceptable.
Today we can run more than 800 browsers in parallel in three data centers. For the coordinator, this is not the limit.

Fault tolerance is ensured by the launch of several coordinator instances wound up behind the DNS firewall in different data centers. This guarantees access to the working instance in case of a problem with the data center or server.

As a result, we got a solution that met all the initially set requirements. It has been operating steadily since 2015 and has proven its effectiveness.

Android

When it comes to testing on Android, there are usually two main approaches. The first is to use WebDriver - this is how Selendroid and Appium work. The second - in working with native tools, thus implemented Robotium, UI Automator or Espresso.

The fundamental similarities between these approaches are to get the device and get the browser.

There are much more differences, the main one is the need to install the tested APK, with which we will take artifacts in the form of logs, screenshots, etc. and also, the fact that testing is carried out on the device itself, and not on the CI.

In 2015, Odnoklassniki began to cover their Android application with autotests. We selected one Linux machine, connected one real device via USB and started writing tests on Robotium. This simple solution allowed you to quickly get results.

Time passed, the number of tests and the number of devices grew. To solve management tasks, Device Manager was created - a wrapper over adb (Android Debug Bridge) commands, which allows the http api interface to execute them.

This is how the first API for Device Manager looked - with its help you could get a list of devices, install / uninstall APKs, run tests and get results.

However, we noticed that the test results degrade at startup on the ADB server to which more than one device is connected. The solution that helped us improve stability was found in isolating each ADB server using Docker.

The farm is ready - you can connect phones.

Many are familiar with this picture. I heard that if you are engaged in Android farms, you are as if in hell every day.

An Android emulator came to our aid. Its use was due to two factors: firstly, at that time it had already reached the necessary level of stability, and secondly, we did not have any features that would depend specifically on iron in our tests. In addition, this emulator was well projected onto the existing infrastructure at that time. The next step was to teach the Node manager to launch new types of resources.

What is required to run the Android emulator?

First, you need an Android SDK with a set of utilities.

Then you need to create AVD - Android Virtual Device - this is how your Android emulator will be organized - what architecture it will have, how many cores it will use, whether Google services will be available, etc.

After that, you need to select the name of the created AVD, set the parameters, for example, transfer the port on which ADB will be launched, and start.

However, there is a peculiarity in such a scheme - the system allows you to run only one instance emulator on one specific AVD.

The solution to this problem was to create a basic AVD, which was stored in memory, this made it possible to copy it somewhere else. During the launch of the Android emulator, the base AVD was copied to a temporary directory mapped into memory, after which it started. Such a scheme worked quickly, but was cumbersome. To date, this problem has been solved by the read only option, which allows you to run Android emulators in unlimited quantities from one AVD

Performance

Based on the results of working with AVD, we developed several internal recommendations:

86 , ARM . dev/kvm Linux HAXM- Mac Windows
GPU- . , . , , , Android-
.
, localhost,

As for the Docker-images for testing on Android, I want to highlight Agoda and Selenoid, they use the capabilities of Android emulators to the maximum.

The difference between them is that in the default Selenoid have Appium , Agoda and used "clean" emulator. In addition, Selenoid has more community support.

At the end of 2018, CloudNode-Manager was created, it contacts the coordinator, receives tasks and is launched using commands in the cloud. Instead of iron machines, this service uses the resources of one-cloud - Odnoklassniki's own private cloud.

We managed to achieve scaling by teaching DeviceManager how to work with the Coordinator. To do this, I had to change the Device manager API to add the ability to request a device type (virtual / real).

This is what happens if you try to run ADB Install on 250 emulators from a single machine.

The attendants immediately reacted to this and started an incident - the machine loaded the gigabit network interface with outgoing traffic. This complexity was resolved by increasing the throughput on the server. I can’t say that this problem gave us a lot of trouble, but you should not forget about it.

It would seem that success is the Devicemanager, coordinator, scaling. We can run tests on the whole farm. In principle, we can run them on every pull request, and the developer will quickly receive feedback.

But not everything is so rosy. You may have noticed that so far nothing has been said about the quality of the tests.

This is what our launches looked like. And the most interesting thing is that between the launches completely different tests could fall. These were unstable falls. And neither I, nor the developers, nor the testers trusted these results.

How did we deal with this problem? They just copied everything from Robotium to Espresso, and it became good ... Actually, no.

To solve this problem, we not only rewrote everything on Espresso, but also started using the API for all kinds of actions such as uploading photos, creating posts, adding to friends, etc., made a quick login, used diplinks that allowed you to go directly to the desired screen , and, of course, we analyzed all test cases.

Now the test runs look like this:

You may notice that the red tests remain, but it is important to remember that these are end-to-end tests that run on production. We have a limit on the number of tests that can fall in the main branch of the application.

Now we have stable tests and scaling. However, the test infrastructure is still highly tied to the tests. At the same time, due to the expectation of end-to-end tests, CI is busy, and other assemblies can queue, waiting for free agents. In addition, there is no clear scheme for working with parallel starts.

The reasons mentioned above became the impetus for the development of QueueRunner - a service that allows you to run tests asynchronously without blocking the CI. To work, he needs a test and test APK, as well as a set of tests. Having received the necessary data, he will be able to organize runs of runs in the queue, allocating and freeing the required resources. QueueRunner downloads the results of the run to Jira and Stash, and also sends it by mail and in the messenger.

QueueRunner has a test flow - it monitors the life cycle of the test. The default flow that we use now consists of five steps:

Receiving device. At this point, the Devicemanager requests through the coordinator a real or virtual device.
. APK , – , .
,

As a result, five simple steps are the entire test life cycle in our service.

What benefits did QueueRunner give us? Firstly, it uses all possible resources to the maximum - it can be scaled to the entire farm and quickly get results. Secondly, with the bonus, we got the opportunity to control the sequence of tests. For example, we can run the longest or most problematic tests at the beginning and thus reduce the time it takes to wait for them to run.

QueueRunner also allows you to make smart retrays. We store all the data in the database, so at any time we can see the history of the test. For example, it is possible to look at the ratio of successful and unsuccessful passes of the test and decide whether, in principle, it is worth restarting the test.

QueueRunner and Devicemanager have given us the ability to adapt to the amount of resources. Now we can scale to the whole farm, thanks to the use of emulators, that is, an almost unlimited number of virtual devices gave us the opportunity to run much more tests, but if for some reason the resources are unavailable, the service will wait for them to return and there will be no loss of launches. We use only the resources available to us, accordingly, after some time the results will still be obtained and at the same time the CI will not be blocked. And most importantly, the test infrastructure and tests are now separate.
Now, in order to run tests on Android, you just need to give us a test APK and a list of tests.

We have come a long way from the Selenium farm on virtual machines to the launch of Android tests in the cloud. However, this path has not yet been completed.

Development process

Let's see how the test infrastructure is related to the development process and how testers and developers see it.

Our Android team uses the standard GitFlow:

Each feature has its own branch. The main development takes place in the develop branch. A developer who decides to create a new super feature begins its development in its own branch, while other developers can work in other branches in parallel. When a developer considers that the ideally beautiful, best code in the world is ready and needs to be rolled out to users as quickly as possible, he makes a pull request in develop, the unit is automatically assembled, unit tests and component tests are run. Simultaneously, the APKs are assembled, sent to QueueRunner, and end-to-end tests are run. After that, the results of running the tests come to the developer.

However, there is a high probability that after the creation of the feature branch in develop, there were many commits. This means that develop may not be what it used to be. Therefore, pre-merge happens first - we merge develop into the current feature-branch, and it is on this premature state that we build, unit-tests, component tests, end-to-end, and based on these results we make a report. Thus, we understand how functional the feature is in the current version of develop and, if everything is OK, then it is sent to users.

Reporting

This is how Stash reporting looks like:

Our bot first writes that the tests have started, and when they pass, it updates the message and adds how many have passed, how many have fallen, how many known errors, and how many Flaky-tests. He writes the same thing in Jira, and adds a link to a comparison of launches.

This is how the comparison of the two launches looks:

Here the current run in the feature branch is compared with the last run in the development. It contains information about the number of tests being run, matching problems, dropped tests, and unstable Flaky tests that were in one state and switched to another.

If at least one unit test or more than some threshold of end-to-end tests falls, then merge will be blocked.

In order to understand whether the tests fall stably, we compare the hashes of the trace traces of the falls, before that they are preliminarily cleared of the digits, only the line numbers remain. If the hashes match, then this is the same fall, if they are different, then most likely the falls will be different.

Summary

As a result, we implemented a stable, fault-tolerant solution that scales well to our infrastructure. Then the resulting infrastructure was adapted for Android testing. We were helped in this by the Device manager, which helps us to work both with real devices and virtual ones, as well as QueueRunner, which helped us to separate the infrastructure and tests, and not to block CI during the tests.

It looked like the test run time for one week in 2016 - from fifty minutes or longer.

That's how it looks now:

This chart shows the runs that took place over 2 hours of an average business day. The running time was reduced to a maximum of 15 minutes, the number of runs increased markedly.

Scaling Android Testing at Odnoklassniki