DevOps tools are not just for DevOps. The process of building a test automation infrastructure from scratch

Part 1: Web / Android


Note : this article is a translation into Russian of the original article  “DevOps tools are not only for DevOps. Building test automation infrastructure from scratch. " However, all illustrations, links, quotes and terms are stored in the original language to avoid distortion of meaning when translated into Russian. I wish you a pleasant study!



Currently, the DevOps specialty is one of the most popular in the IT industry. If you open popular job search sites and set a salary filter, you will see that jobs related to DevOps are at the top of the list. However, it is important to understand that this mainly refers to the position of 'Senior', which implies that the candidate has a high level of skills, knowledge of technologies and tools. It also comes with a high degree of responsibility associated with the smooth operation of production. However, we began to forget what DevOps is. Initially, it was not a specific person or department. If we look for the definitions of this term, we will find many beautiful and correct nouns, such as methodology, practices, cultural philosophy, a group of concepts and so on.

My specialization is a QA automation engineer, but I believe that it should not be only related to writing auto-tests or developing an architecture for a test framework. In 2020, knowledge of automation infrastructure is also needed. This allows you to organize the automation process yourself, from the launch of tests to the provision of results to all interested parties in accordance with the goals. As a result, DevOps skills are a must for this job. And all this is good, but, unfortunately, there is a problem ( spoiler: this article makes attempts to simplify this problem) It lies in the fact that DevOps is complicated. And this is obvious, because companies will not pay much for what is easy to do ... In the world of DevOps there are a large number of tools, terms, practices that need to be mastered. This is especially difficult at the beginning of a career and depends on the accumulated technical experience.


Source: http://maximelanciauxbi.blogspot.com/2017/04/devops-tools.html

Here, we will probably complete with the introductory part and focus on the purpose of this article. 

What is this article about


In this article I am going to share my experience in building a test automation infrastructure. On the Internet you can find many sources of information about various tools and how to use them, but I would like to consider them exclusively in the context of automation. I believe that many automation engineers are familiar with a situation where no one runs the developed tests, except for yourself, and does not care about their support. As a result, the tests become obsolete and you have to spend time updating them. Again, at the beginning of a career, this can be quite a difficult task: to correctly decide which tools should help solve this problem, how to choose, configure and maintain them. Some testers ask for help from DevOps (people) and, to be honest, this approach works.In many cases, this may be the only option, since we do not have the visibility of all the dependencies. But, as we know, DevOps are very busy guys, because they should think about the infrastructure of the entire company, deployment, monitoring, microservices and other similar tasks depending on the organization / team. As it usually happens, automation is not a priority. In this case, we should try to do our best from the beginning to the end. This will reduce addictions, speed up the workflow, improve our skills and allow us to see a wider picture of what is happening.microservices and other similar tasks depending on the organization / team. As it usually happens, automation is not a priority. In this case, we should try to do our best from the beginning to the end. This will reduce addictions, speed up the workflow, improve our skills and allow us to see a wider picture of what is happening.microservices and other similar tasks depending on the organization / team. As it usually happens, automation is not a priority. In this case, we should try to do our best from the beginning to the end. This will reduce addictions, speed up the workflow, improve our skills and allow us to see a wider picture of what is happening.

The article presents the most popular and popular tools and shows how to use them for step-by-step construction of automation infrastructure. Each group is represented by tools that have been tested on personal experience. But this does not mean that you should use the same. The tools themselves are not important, they appear and become obsolete. Our engineering task is to understand the basic principles: why do we need this group of tools and what work tasks can we solve with their help. Therefore, at the end of each section, I leave links to similar tools that may be used in your organization.

What is not in this article


Once again, the article is not about specific tools, so there will be no code insertions from the documentation and descriptions of specific commands. But at the end of each section I leave links for a detailed study.

This is due to the fact that: 

  • this material is very easy to find in various sources (documentation, books, video courses);
  • if we begin to delve deeper, we will have to write 10, 20, 30 parts of this article (whereas plans 2-3);
  • I just don’t want to waste your time, because maybe you want to use other tools to achieve the same goals.

Practice


I would really like this material to be useful to every reader, and not just be read and forgotten. In any study, practice is a very important component. To do this, I prepared a GitHub repository with a step-by-step guide on how to do everything from scratch . You will also be waiting for homework to be sure that you have not thoughtlessly copied lines of executed commands

Plan


StepTechnologyTools
1Local running (prepare web / android demo tests and run it locally) Node.js, Selenium, Appium
2Version control systems Git
3ContainerizationDocker, Selenium grid, Selenoid (Web, Android)
4CI / CDGitlab ci
5Cloud platformsGoogle cloud platform
6OrchestrationKubernetes
7Infrastructure as a code (IaC)Terraform, ansible

The structure of each section


To maintain the narrative in a visual form, each section is described as follows:

  • ,
  • ,
  • ,
  • ,
  • .

1.



This is just a preparatory step to run the demo tests locally and to verify that they pass successfully. In the practical part, Node.js is used, but the programming language and platform are also not important and you can use the ones that are used in your company. 

However, as an automation tool, I recommend using Selenium WebDriver for web platforms and Appium for the Android platform, respectively, since in the next steps we will use Docker images that are tailored to work specifically with these tools. Moreover, referring to the requirements in vacancies, these tools are most in demand on the market.

As you may have noticed, we only consider web and Android tests. Unfortunately, iOS is a completely different story (thanks to Apple). I plan to demonstrate the solutions and practices related to iOS in the following parts.

Value for automation infrastructure


From an infrastructure point of view, a local launch does not carry any value. You only verify that the tests run on the local machine in local browsers and simulators. But in any case, this is a necessary starting point.

Illustration of the current state of infrastructure




Learning Links



Similar tools


  • any programming language that you like in conjunction with Selenium / Appium - tests;
  • any tests;
  • any test runner.

2. Version control systems (Git)


Technology Brief


It will not be a big discovery for anyone if I say that the version control system is an extremely important part of development both as a team and individually. Based on various sources, we can confidently say that Git is the most popular representative. Version control system provides many advantages, such as code exchange, version storage, restoration to previous branches, monitoring of project history, backups. We will not discuss each item in detail, as I am sure that you are familiar with this and use it in everyday work. But if suddenly not, then I recommend pausing the reading of this article and filling in this gap as soon as possible.

Value for automation infrastructure


And here you can ask a reasonable question: “Why does he tell us about Git? Everyone knows and uses this for both development code and auto-test code. ” You will be absolutely right, but in this article we are talking about infrastructure and this section plays the role of a preview for section 7: “Infrastructure as code (IaC)”. For us, this means that the entire infrastructure, including the test one, is described in the form of code, so we can also apply versioning systems to it and get similar advantages for both development and automation code.

We'll look at IaC in more detail in step 7, but even now you can start using Git locally by creating a local repository. The big picture will be expanded when we add a remote repository to the infrastructure.

Illustration of the current state of infrastructure




Learning Links



Similar tools



3. Containerization (Docker)


Technology Brief


To demonstrate how containerization changed the rules of the game, let's go back several decades. In those days, people purchased and used server machines to run applications. But in most cases, the required required resources for launch were not known in advance. As a result, companies spent money on the purchase of expensive powerful servers, but some of these capacities were not completely utilized.

The next stage of evolution was virtual machines (VMs), which solved the problem of spending money on unused resources. This technology made it possible to run applications independently of each other within the same server, allocating completely isolated space. But, unfortunately, any technology has its drawbacks. Starting a VM requires a full-fledged operating system that consumes CPU, RAM, storage and, depending on the OS, you need to consider the cost of a license. These factors affect download speed and complicate portability.

And so we came to containerization. And again, this technology solved the previous problem, since the containers do not use a full-fledged OS, which allows you to free up a large number of resources and provides a quick and flexible solution for portability.

Of course, containerization technology is not new and was first introduced in the late 70s. In those days, there was a lot of research, groundwork, and attempts. But it was Docker who adapted this technology and made it easily accessible to the masses. Nowadays, when we talk about containers, in most cases we mean Docker. When we talk about Docker containers, we mean Linux containers. We can use Windows and macOS systems to run containers, but it’s important to understand that in this case an additional layer appears. For example, Docker on a Mac silently launches containers inside a lightweight Linux VM. We will return to this topic when we discuss the launch of Android emulators inside containers, and this is where a very important nuance appears, which needs to be analyzed in more detail.

Value for automation infrastructure


We found out that containerization and Docker are cool. Let's look at this in the context of automation, because every tool or technology should solve a problem. Let us denote the obvious problems of testing automation in the context of UI tests:

  • a huge number of dependencies when installing Selenium and especially Appium;
  • compatibility issues between versions of browsers, simulators and drivers;
  • lack of isolated space for browsers / simulators, which is especially critical for parallel launch;
  • It’s hard to manage and maintain if you need to run 10, 50, 100 or even 1000 browsers at the same time.

But since Selenium is the most popular automation tool, and Docker is the most popular containerization tool, it should not be a surprise to anyone that someone tried to combine them to get a powerful tool to solve the above problems. Let's consider such solutions in more detail. 

Selenium grid in docker

This tool is the most popular in the world of Selenium for launching and managing multiple browsers on multiple machines from a central site. To start, you must register at least 2 parts: Hub and Node (s). A hub is a central site that receives all requests from tests and distributes them to the appropriate Nodes. For each Node, we can configure a specific configuration, for example, specifying the desired browser and its version. However, we still need to take care of compatible browser drivers ourselves and install them on the necessary Nodes. For this reason, the Selenium grid is not used in its pure form, except when we need to work with browsers that cannot be installed on Linux OS.For all other cases, using a Docker image to run the Selenium grid Hub and Nodes is a much flexible and correct solution. This approach greatly simplifies the management of nodes, since we can choose the image we need with already installed compatible versions of browsers and drivers.

Despite the negative feedback on stability, especially when running a large number of Nodes in parallel, Selenium grid is still the most popular tool for running Selenium tests in parallel. It is important to note that in open-source various improvements and modifications of this tool constantly appear that struggle with various bottlenecks.

Selenoid for web

This tool is a breakthrough in the Selenium world, since it works right out of the box and has made the lives of many automation engineers much easier. First of all, this is not another modification of the Selenium grid. Instead, the developers created a completely new version of Selenium Hub in the Golang language, which, in conjunction with lightweight Docker images for various browsers, gave impetus to the development of test automation. Moreover, in the case of Selenium Grid, we must determine all the required browsers and their versions in advance, which is not a problem when working with only one browser. But when it comes to several supported browsers, Selenoid is the number one solution, thanks to the 'browser on demand' feature. All that is required of us is to preload the necessary images with browsers and update the configuration file,with which Selenoid interacts. After Selenoid receives a request from the tests, it will automatically launch the right container with the right browser. When the test completes, Selenoid will drop the container, thereby freeing up resources for the following queries. This approach completely eliminates the well-known problem of 'node degradation', which we often see in the Selenium grid.

But alas, Selenoid is still not a silver bullet. We've got the on-demand browser feature, but the on-demand resources feature is still not available. To use Selenoid, we need to deploy it on a physical hardware or VM, which means that we need to know in advance how much resources need to be allocated. I believe this is not a problem for small projects that run 10, 20, or even 30 browsers in parallel. But what if we need 100, 500, 1000 and more? It makes no sense to maintain and pay for so many resources all the time. In sections 5 and 6 of this article, we will discuss solutions that allow you to scale, thereby significantly reducing company costs.

Selenoid for Android

After the success of Selenoid as a tool for web automation, people wanted to get something similar for Android. And it happened - Selenoid was released with Android support. From a high-level user perspective, the principle of operation is similar to web automation. The only difference is that instead of containers with browsers, Selenoid launches containers with Android emulators. In my opinion, today it is the most powerful free tool for running Android tests in parallel.

I would really not like to talk about the negative aspects of this tool, since I really like it very much. But still there are the same drawbacks related to web automation related to scaling. In addition to this, we need to talk about another limitation, which may come as a surprise if we tune the instrument for the first time. To run Android images, we need a physical machine or VM with nested virtualization support. In the practical guide, I demonstrate how to activate this on a Linux VM. However, if you are a macOS user and want to deploy Selenoid locally, this will not be possible to run Android tests. But you can always start the Linux VM locally with the configured 'nested virtualization' and deploy Selenoid inside.

Illustration of the current state of infrastructure


In the context of this article, we will add 2 tools to illustrate the infrastructure. These are Selenium grid for web tests and Selenoid for Android tests. In the GitHub manual, I will also show how to use Selenoid to run web tests. 



Learning Links



Similar tools


  • Other containerization tools exist, but Docker is the most popular. If you want to try something else, keep in mind that the tools that we reviewed for running Selenium tests in parallel will not work out of the box.  
  • As already mentioned, there are many modifications of the Selenium grid, for example, Zalenium .

4. CI / CD


Technology Brief


The practice of continuous integration is quite popular in development and is on a par with version control systems. Despite this, I feel that there is confusion in the terminology. In this section, I would like to describe 3 modifications of this technology from my point of view. On the Internet you can find many articles with different interpretations, and it is absolutely normal if your opinion is different. The most important thing is that you are on the same wavelength as your colleagues.

So, there are 3 terms: CI - Continuous Integration (CD), CD - Continuous Delivery (CD) and again CD - Continuous Deployment (CD). ( Further I will use these terms in English) Each modification adds a few extra steps to your development pipeline. But the word continuous is the most important. In this context, we mean something that happens from beginning to end, without interruption or manual exposure. Let's take a look at CI & CD and CD in this context.

  • Continuous Integration is the initial step in evolution. After sending the new code to the server, we expect to get a quick feedback that everything is in order with our changes. Typically, CI includes the launch of static code analysis tools and unit / internal API tests. This allows you to get information about our code a few seconds / minutes later.
  • Continuous Delivery , /UI-. , CI. -, . -, test/staging — . , , .
  • Continuous Deployment , (release) production, . release , smoke — production . Continuous Deployment . - , , Continuous (). , Continuous Delivery.


In this section, I must clarify that when we talk about end-to-end UI tests, this implies that we must deploy our changes and related services to test environments. Continuous Integration - the process is not applicable for this task and we must take care to implement at least Continuous Deliver practices. Continuous Deployment also makes sense in the context of UI tests if we are going to run them on production.

And before we look at the illustration of the architecture change, I want to say a few words about GitLab CI. Unlike other CI / CD tools, GitLab provides a remote repository and many other advanced features. So GitLab is more than CI. It includes out of the box source control, Agile management, CI / CD pipelines, logging tools, and metrics collection. GitLab architecture consists of Gitlab CI / CD and GitLab Runner. I give a brief description from the official site:
Gitlab CI / CD is a web application with an API that stores its state in a database, manages projects / builds and provides a user interface. GitLab Runner is an application which processes builds. It can be deployed separately and works with GitLab CI / CD through an API. For tests running you need both Gitlab instance and Runner.








5.



In this section, we will talk about a popular trend called 'public clouds'. Despite the enormous benefits that the virtualization and containerization technologies described above provide, we still need computing resources. Companies purchase expensive servers or rent data centers, but in this case it is necessary to make calculations (sometimes unrealistic) of how many resources we will need, whether we will use them 24/7 and for what purposes. For example, production requires a server running around the clock, but do we need similar resources for testing after hours? It also depends on the type of testing being performed. An example would be stress / stress tests, which we plan to run after hours to get results the next day. But definitely24-hour server availability is not required for end-to-end auto-tests, and especially for manual testing environments. For such situations, it would be good to receive as many resources as needed on demand, use them and stop paying when they are no longer needed. Moreover, it would be nice to receive them instantly by making a few clicks of the mouse or by running a couple of scripts. For this, public clouds are used. Let's look at the definition:For this, public clouds are used. Let's look at the definition:For this, public clouds are used. Let's look at the definition:
“The public cloud is defined as computing services offered by third-party providers over the public Internet, making them available to anyone who wants to use or purchase them. "They may be free or sold on-demand, allowing customers to pay only per usage for the CPU cycles, storage, or bandwidth they consume."

There is an opinion that public clouds are expensive. But their key idea is to reduce company costs. As mentioned earlier, public clouds allow you to get resources on demand and pay only for the time they are used. Also, sometimes we forget that employees get paid, and specialists are also an expensive resource. Keep in mind that public clouds greatly facilitate infrastructure support, which allows engineers to focus on more important tasks. 

Value for automation infrastructure


What specific resources do we need for end-to-end UI tests? These are mainly virtual machines or clusters (we will talk about Kubernetes in the next section) to launch browsers and emulators. The more browsers and emulators we want to run at the same time, the more CPU and memory are required and the more money we will have to pay for it. Thus, public clouds in the context of test automation allow us to launch a large number (100, 200, 1000 ...) of browsers / emulators on demand, get test results as quickly as possible and stop paying for such insanely resource-intensive capacities. 

The most popular cloud providers are Amazon Web Services (AWS), Microsoft Azure, Google Cloud Platform (GCP). The practical guide provides examples of using GCP, but in general it doesn’t matter what you will use for automation tasks. All of them provide approximately the same functionality. Typically, to select a provider, management focuses on the company's entire infrastructure and business requirements, which are beyond the scope of this article. It will be more interesting for automation engineers to compare the use of cloud providers using cloud platforms specifically for testing purposes such as Sauce Labs, BrowserStack, BitBar and so on. So let's do the same! In my opinion, Sauce Labs is the most famous cloud testing farm, so I took it for comparison. 

GCP vs. Sauce Labs for Automation:

Imagine that we need to run 8 web tests and 8 Android tests at the same time. To do this, we will use GCP and run 2 virtual machines with Selenoid. On the first we will raise 8 containers with browsers. On the second - 8 containers with emulators. Let's take a look at the prices:  


To run one container with Chrome, we need an n1-standard-1 machine. In the case of Android, this will be n1-standard-4 for one emulator. In fact, a more flexible and cheaper way is to set specific user values ​​for CPU / Memory, but at the moment it is not important for comparison with Sauce Labs.

And here are the rates for using Sauce Labs:


I suppose you have already noticed the difference, but still I will give a table with the calculations for our task:

Required resourcesMontlyWorking hours (8 am - 8 pm)Working hours + Preemptible
Gcp for webn1-standard-1 x 8 = n1-standard-8$ 194.1823 days * 12h * 0.38 = 104.88 $ 23 days * 12h * 0.08 = 22.08 $
Sauce Labs for WebVirtual Cloud8 parallel tests$ 1.559--
GCP for Androidn1-standard-4 x 8: n1-standard-16$ 776.7223 days * 12h * 1.52 = $ 419.52 23 days * 12h * 0.32 = 88.32 $
Sauce Labs for AndroidReal Device Cloud 8 parallel tests$ 1.999--

As you can see, the difference in cost is huge, especially if you run the tests only in the working twelve-hour period. But you can further reduce costs by using preemptible machines. What is it?
A preemptible VM is an instance that you can create and run at a muchower price than normal instances. However, Compute Engine might terminate (preempt) these instances if it requires access to those resources for other tasks. Preemptible instances are excess Compute Engine capacity, so their availability varies with usage.

If your apps are fault-tolerant and can withstand possible instance preemptions, then preemptible instances can reduce your Compute Engine costs significantly. For example, batch processing jobs can run on preemptible instances. If some of those instances terminate during processing, the job slows but does not completely stop. Preemptible instances complete your batch processing tasks without placing additional workload on your existing instances and without requiring you to pay full price for additional normal instances.

And this is still not the end! In fact, I’m sure that no one runs tests for 12 hours without a break. And if so, then you can automatically start and stop virtual machines when they are not needed. Actual usage time may drop to 6 hours per day. Then payment in the context of our task will decrease as much as $ 11 per month for 8 browsers. Isn't that perfect? But with preemptible machines, we must be careful and prepared for interruptions and unstable operation, although these situations can be provided and processed programmatically. It's worth it!

But by no means do I say 'never use cloud test farms'. They have several advantages. First of all, this is not just a virtual machine, but a complete solution for testing automation with a set of functionality out of the box: remote access, logs, screenshots, video recording, various browsers and physical mobile devices. In many situations, this can be an indispensable chic alternative. Especially test platforms are useful for IOS automation when public clouds can only offer Linux / Windows systems. But talk about iOS will be in future articles. I recommend always looking at the situation and starting from tasks: in some it’s cheaper and more efficient to use public clouds, and in some test platforms it’s definitely worth the money spent.

Illustration of the current state of infrastructure




Learning Links



Similar tools:



6. Orchestration


Technology Brief


I have good news - we have almost reached the end of the article! At the moment, our automation infrastructure consists of web and Android tests, which we run through GitLab CI in parallel, using tools with Docker support: Selenium grid and Selenoid. Moreover, we use virtual machines created through GCP to lift containers with browsers and emulators in them. To reduce costs, we start these virtual machines only on demand and stop when testing is not performed. Is there anything else that can improve our infrastructure? The answer is yes! Meet Kubernetes (K8s)!

To get started, consider how the words orchestration, cluster, and Kubernetes are related. At a high level, orchestration is a system that deploys and manages applications. To automate testing, such containerized applications are Selenium grid and Selenoid. Docker and K8s complement each other. The first is used to deploy applications, the second for orchestration. In turn, K8s is a cluster. The cluster’s task is to use VMs as Nodes, which allows you to install various functionalities, programs and services within one server (cluster). If any of the Node crashes, then other Nodes will pick up, which ensures our application runs smoothly. In addition to this, K8s has important functionality related to scaling,thanks to which we automatically get the optimal amount of resources, based on the load and established restrictions.

In truth, manually deploying Kubernetes from scratch is a completely non-trivial task. I will leave a link to Kubernetes The Hard Way, a well-known practical guide, and if you are interested, you can practice. But, fortunately, there are alternative ways and tools. The easiest of them is to use the Google Kubernetes Engine (GKE) in GCP, which will allow you to get a ready-made cluster after a few clicks. To start the study, I recommend that you use this approach, as it will allow you to focus on learning how to use K8s for your tasks, instead of exploring how the internal components should be integrated. 

Value for automation infrastructure


Consider some of the significant features that K8s provides:

  • : multi-nodes , VMs;
  • : , ;
  • (Self-healing): pods ( );
  • : ,

But the K8s are still not a silver bullet. To understand all the advantages and limitations in the context of the tools we are considering (Selenium grid, Selenoid), we briefly discuss the structure of K8s. Cluster contains two types of Nodes: Master Nodes and Workers Nodes. Master Nodes are responsible for managing, deploying, and scheduling decisions. Workers nodes is where the applications are running. Nodes also contains a container launch environment. In our case, this is Docker, which is responsible for operations related to containers. But there are alternative solutions like containerd. It is important to understand that scaling or self-healing does not apply to containers directly. This is done by adding / decreasing the number of pods, which in turn contain containers (usually one container per pod, but there may be more depending on the task). The high-level hierarchy is worker nodes, inside which are pods, inside which containers are raised.

The scaling function is key and can be applied both to nodes inside a cluster node-pool, and to pods inside a node. There are 2 types of scaling that apply to both nodes and pods. The first type - horizontal - scaling occurs due to an increase in the number of nodes / pods. This type is more preferred. The second type, respectively, is vertical. Scaling is done by increasing the size of nodes / pods, not their number.

Now consider our tools in the context of the above terms.

Selenium grid

As mentioned earlier, Selenium grid is a very popular tool, and it is not a surprise that it was containerised. Therefore, it is not surprising that Selenium grid can be deployed in K8s. An example of how to do this can be found in the official K8s repository. As usual, I attach links at the end of the section. In addition to this, the practical guide shows how to do this in the Terraform series. There is also an instruction on how to scale the number of pods that contain containers with browsers. But the auto-scaling feature in the context of K8s is still not an obvious task. When I started the study, I did not find any practical guidance or recommendations.After several studies and experiments with the support of the DevOps team, we chose the approach of raising containers with the desired browsers inside one pod, which is located inside one worker node. This method allows us to apply the strategy of horizontal scaling of nodes by increasing their number. I hope that in the future the situation will change, and we will see more and more descriptions of the best approaches and turnkey solutions, especially after the release of Selenium grid 4 with a modified internal architecture.especially after the release of Selenium grid 4 with a redesigned internal architecture.especially after the release of Selenium grid 4 with a redesigned internal architecture.

Selenoid :

Currently, deploying Selenoid in K8s is the biggest disappointment. They are not compatible. Theoretically, we can raise a Selenoid container inside a pod, but when Selenoid starts launching containers with browsers, they will still be inside the same pod. This makes scaling impossible and, as a result, the work of Selenoid inside the cluster will not be different from the work inside the virtual machine. The end of the story.

Moon :

Knowing this bottleneck while working with Selenoid, the developers released a more powerful tool called Moon. This tool was originally conceived for working with Kubernetes and, as a result, you can and should use the autoscale function. Moreover, I would say that at the moment it is the onlya tool in the Selenium world, which out of the box has native K8s cluster support ( no longer available, see the next tool ). A key feature of Moon that provides this support is: 
Completely stateless. Selenoid stores in memory information about currently running browser sessions. If for some reason its process crashes - then all running sessions are lost. Moon contrarily has no internal state and can be replicated across data centers. Browser sessions remain alive even if one or more replicas go down.
So, Moon is a great solution, but with one problem, it is not free. The price depends on the number of sessions. Only 0-4 sessions can be launched for free, which is not particularly useful. But, starting from the fifth session, you will have to pay $ 5 for each. The situation may differ from company to company, but in our case, using Moon is pointless. As I described above, we can start VMs with Selenium Grid on demand or increase the number of Nodes in the cluster. For approximately one pipeline, we launch 500 browsers and stop all resources after the tests are completed. If we used Moon, we would have to pay an extra 500 x 5 = $ 2500 per month and it doesn’t matter how often we run the tests. And again, I do not say "do not use Moon." For your tasks, this can be an indispensable solution, for example,if you have a lot of projects / teams in your organization and you need a huge common cluster for everyone. As always, I leave a link at the end and recommend making all the necessary calculations in the context of your task.

Callisto : ( Attention! This is not in the original article and is contained only in the Russian translation )

As I said, Selenium is a very popular tool, and the IT industry is developing very quickly. While I was working on the translation, a new promising Callisto tool appeared on the network (hello Cypress and other Selenium killers). It works natively with K8s and allows you to run Selenoid containers in pods, distributed by Nodes. Everything works right out of the box, including autoscaling. Fiction, but it is necessary to test. I have already managed to deploy this tool and put in some experiments. But it’s too early to draw conclusions, after receiving the results at a long distance, perhaps I will review in the following articles. So far I leave only links for independent research.  







7. (IaC)



And so we got to the last section. Usually, this technology and related tasks are not included in the area of ​​responsibility of automation engineers. And there are reasons for this. Firstly, in many organizations, infrastructure issues are under the control of the DevOps department and development teams do not really care about what the pipeline works with and how to support everything related to it. Secondly, let's be honest, the practice of "Infrastructure as a Code (IaC)" is still not applied in many companies. But it has definitely become a popular trend, and it’s important to try to be involved in related processes, approaches and tools. Or at least keep abreast of developments.

Let's start with the motivation for using this approach. We have already discussed that to run the tests in GitlabCI we need at least the resources to run the Gitlab Runner. And to run containers with browsers / emulators, we need to reserve a VM or cluster. In addition to testing resources, we need a significant number of capacities to support development environments, staging, production, which also includes databases, automatic schedules, network configurations, load balancers, user rights, and so on. A key issue is the effort required to support this all. There are several ways in which we can make changes and roll out updates. For example, in the context of GCP, we can use the UI console in the browser and perform all actions by clicking buttons.An alternative way would be to use API calls to interact with cloud entities or use the gcloud command line utility to perform the necessary manipulations. But with a really large number of different entities and infrastructure elements, it becomes difficult or even impossible to perform all operations manually. Moreover, all these manual actions are uncontrollable. We cannot send them for review before execution, use the version control system and quickly roll back the edits that led to the incident. To solve such problems, engineers created and are creating automatic bash / shell scripts, which is not much better than the previous methods, since they are not so easy to read, understand, maintain and modify in a procedural style.

In this article and how-to, I use 2 tools related to IaC practice. These are Terraform and Ansible. Some believe that it makes no sense to use them simultaneously, since their functionality is similar and they are interchangeable. But the fact is that initially they are faced with completely different tasks. And the fact that these tools should complement each other was confirmed at a joint presentation by developers representing HashiCorp and RedHat. The conceptual difference is that Terraform is a provisioning tool for managing the servers themselves. While Ansible is a configuration management tool whose task is to install, configure and manage software on these servers.

Another key distinguishing feature of these tools is the style of writing code. Unlike bash and Ansible, Terraform uses a declarative style based on the description of the desired final state, which must be achieved as a result of execution. For example, if we are going to create 10 VMs and apply the changes through Terraform, then we get 10 VMs. If you apply the script again, nothing will happen, since we already have 10 VMs, and Terraform knows about it, because it stores the current state of the infrastructure in a state file. But Ansible uses a procedural approach, and if you ask him to create 10 VMs, then on the first run we get 10 VMs, similarly to Terraform. But after restarting, we will already have 20 VMs. This is an important difference.In procedural style, we do not store the current state and simply describe the sequence of steps that must be completed. Of course, we can handle various situations, add several checks for the existence of resources and the current state, but there is no point in wasting our time and making efforts to control this logic. In addition, this increases the risk of making mistakes. 

Summarizing all of the above, we can conclude that for provisioning servers, Terraform and declarative notation are a more suitable tool. But the work of configuration management is better delegated to Ansible. Having dealt with this, let's look at examples of use in the context of automation.

Value for automation infrastructure


It is important to understand only that the test automation infrastructure should be considered as part of the entire infrastructure of the company. This means that all IaC practices must be applied globally to the resources of the entire organization. Who is responsible for this depends on your processes. DevOps-team is more experienced in these matters, they see the whole picture of what is happening. However, QA engineers are more involved in the process of building automation and the structure of the pipeline, which allows them to better see all the required changes and opportunities for improvement. The best option is to work together, share knowledge and ideas to achieve the expected result. 

Here are some examples of using Terraform and Ansible in the context of testing automation and the tools we discussed before:

1. Describe through Terraform the necessary characteristics and parameters of VMs and clusters.

2. Install with Ansible the necessary tools for testing: docker, Selenoid, Selenium Grid and download the necessary versions of browsers / emulators.

3. Describe through Terraform the characteristics of the VM in which the GitLab Runner will be launched.

4. Install with Ansible GitLab Runner and the necessary related tools, set the settings and configurations.

Illustration of the current state of infrastructure



Study links:



Similar tools




To summarize!


StepTechnologyToolsValue for automation infrastructure
1Local runningNode.js, Selenium, Appium
  • web mobile
  • ( Node.js)

2Version control systems Git

3ContainerisationDocker, Selenium grid, Selenoid (Web, Android)
  • ,

4CI / CDGitlab CI
  • /

5Cloud platformsGoogle Cloud Platform
  • ( )

6OrchestrationKubernetes/ pods:
  • /

7Infrastructure as a code  (IaC)Terraform, Ansible
  • All the benefits of code versioning
  • Easy to make changes and maintain
  • Fully automated



Mind map diagrams: evolution of infrastructure


step1: Local


step2: vcs


step3: Containerisation 


step4: CI / CD 


step5: Cloud Platforms


step6: orchestration


step7: IaC


What's next?


So this is the end of the article. But in conclusion, I would like to establish some agreements with you.

On your part
As was said at the beginning, I would like the article to be of practical use and help you apply the knowledge you have gained in real work. I add again a link to a practical guide .

But even after that, don’t stop, practice, study the relevant links and books, find out how it works in your company, find places that can be improved and take part in it. Good luck

From my side

From the heading it is clear that this was only the first part. Despite the fact that it turned out to be quite large, important topics are still not disclosed here. In the second part, I plan to consider the automation infrastructure in the context of IOS. Due to Apple’s limitations regarding running iOS simulations only on macOS systems, our set of solutions has been narrowed down. For example, we are unable to use Docker to run a simulator or public clouds to run virtual machines. But this does not mean that there are no other alternatives. I will try to keep you up to date with advanced solutions and modern tools!

Also, I did not mention quite large topics related to monitoring. In Part 3, I am going to consider the most popular tools for monitoring infrastructure, as well as what data and metrics to consider.

And finally. In the future I plan to release a video course on building a test infrastructure and popular tools. Currently, there are quite a few courses and lectures on DevOps on the Internet, but all the materials are presented in the context of development, but not testing automation. In this matter, I really need feedback if this course will be interesting and valuable for the community of testers and automation engineers. Thanks in advance!

All Articles