This story is about how we use containers in the grocery environment, especially under Kubernetes. The article is devoted to the collection of metrics and logs from containers, as well as a build of images.

We are from the fintech company Exness, which develops services for online trading and fintech products for B2B and B2C. There are many different teams in our R&D, in the development department 100+ employees.

We represent the team that is responsible for the platform for collecting and running code by our developers. In particular, we are responsible for collecting, storing and providing metrics, logs, and events from applications. Currently, we operate around three thousand Docker containers in the product environment, support our 50 TB big data storage and provide architectural solutions that are built around our infrastructure: Kubernetes, Rancher and various public cloud providers.

Our motivation

What is burning? No one can answer. Where is the hearth? It’s hard to understand. When did it catch fire? You can find out, but not immediately.

Why do some containers stand while others fall? Which container was to blame? Indeed, outside the containers are the same, but inside each has its own Neo.

Our developers are literate guys. They make good services that make the company profit. But there are fakapy when containers with applications go randomly. One container consumes too much CPU, the other consumes network, the third consumes I / O operations, and the fourth is generally unclear what it does with sockets. All this falls, and the ship sinks.

Agents

To understand what is going on inside, we decided to put agents directly in containers.

These agents are containment programs that keep containers in such a state that they do not break each other. Agents are standardized, and this allows a standardized approach to container handling.

In our case, agents must provide logs in a standard format, tagged and with trotting. They should also provide us with standardized metrics that are extensible in terms of business applications.

Agents also mean utilities for operation and maintenance, able to work in different orchestration systems, supporting different images (Debian, Alpine, Centos, etc.).

Finally, agents must support a simple CI / CD including Docker files. Otherwise, the ship will fall apart, because the containers will begin to be delivered on "curved" rails.

Assembly process and target device image

In order for everything to be standardized and manageable, you must adhere to some standard assembly process. Therefore, we decided to collect containers by containers - such a recursion.

Here the containers are represented by solid contours. At the same time, they decided to put distributions in them so that "life does not seem raspberry." Why this was done, we will describe below.

The result is a build tool - a container of a certain version, which refers to certain versions of distributions and certain versions of scripts.

How do we use it? We have a Docker Hub in which the container lies. We mirror it inside our system in order to get rid of external dependencies. The resulting container is marked in yellow. We create a template to install in the container all the distributions and scripts we need. After that, we collect an image that is ready for operation: the developers put the code and some special dependencies in it.

Why is this approach good?

Firstly, full version control of build tools - build container, scripts and distributions versions.
Secondly, we have achieved standardization: in the same way we create templates, intermediate and ready for operation image.
Thirdly, containers provide us portability. Today we use Gitlab, and tomorrow we will switch to TeamCity or Jenkins and in the same way we will be able to launch our containers.
Fourth, minimizing dependencies. It is no coincidence that we put distributions in the container, because this allows us not to download them every time from the Internet.
Fifthly, the assembly speed has increased - the availability of local copies of images allows you not to waste time downloading, since there is a local image.

In other words, we have achieved a controlled and flexible assembly process. We use the same tools to build any containers with full versioning.

How our build procedure works

The assembly is launched with one command, the process is performed in the image (highlighted in red). The developer has a Docker-file (highlighted in yellow), we render it by replacing the variables with values. And along the way we add headers and footers - these are our agents.

Header adds distributions from the corresponding images. And the footer installs our services inside, configures the launch of the workload, logging and other agents, replaces the entrypoint, etc.

We thought for a long time whether to set a supervisor. In the end, they decided that we needed him. Choose S6. The supervisor provides control of the container: it allows you to connect to it in the event of a fall in the main process and provides manual control of the container without re-creating it. Logs and metrics are processes that run inside a container. They also need to be somehow controlled, and we do this with the help of a supervisor. Finally, the S6 takes care of housekeeping, signal processing and other tasks.

Since we use different systems of orchestration, after assembly and launch, the container must understand what environment it is in and act on the situation. For instance:
This allows us to collect one image and launch it in different orchestration systems, and it will be launched taking into account the specifics of this orchestration system.

For the same container, we get different process trees in Docker and Kubernetes:

The payload is executed under the S6 supervisor. Pay attention to collector and events - these are our agents responsible for logs and metrics. Kubernetes doesn't have them, but Docker has them. Why?

If you look at the specification of the "hearth" (hereinafter - Kubernetes pod), we will see that the events container is executed in the hearth, in which there is a separate collector container that performs the function of collecting metrics and logs. We can use the capabilities of Kubernetes: running containers in one hearth, in a single process and / or network space. Actually introduce your agents and perform some functions. And if the same container is launched in Docker, it will receive all the same features at the output, that is, it will be able to deliver logs and metrics, since the agents will be launched inside.

Metrics and Logs

Delivery of metrics and logs is a difficult task. There are several aspects to her decision.
The infrastructure is created to fulfill the payload, and not the mass delivery of logs. That is, this process should be performed with minimal requirements for container resources. We strive to help our developers: "Take the Docker Hub container, launch it, and we can deliver the logs."

The second aspect is the limitation of the volume of logs. If in several containers there is a situation of a surge in the volume of logs (the application displays stack-trace in a loop), the load on the CPU, communication channels, the log processing system increases, and this affects the operation of the host as a whole and other containers on the host, sometimes this leads to "Fall" of the host.

The third aspect - you need to support as many metrics collection methods out of the box as possible. From reading files and polling Prometheus-endpoint to using specific application protocols.

And the last aspect - you need to minimize resource consumption.

We chose an open-source Go solution called Telegraf. This is a universal connector that supports more than 140 types of input channels (input plugins) and 30 types of output (output plugins). We finalized it and now we will tell how it is used with Kubernetes as an example.

Suppose a developer deploys a load and Kubernetes receives a request to create a hearth. At this point, a container called Collector is automatically created for each pod (we use mutation webhook). Collector is our agent. At the start, this container configures itself to work with Prometheus and the log collection system.

To do this, he uses the annotations of the hearth, and depending on its contents, creates, say, the end point of the Prometheus;
Based on the specification of the hearth and the specific settings of the containers, it decides how to deliver the logs.

We collect logs through the Docker API: it is enough for developers to put them in stdout or stderr, and then Collector will figure it out. Logs are collected by chunk with some delay in order to prevent possible host congestion.

Metrics are collected on workload instances (processes) in containers. Everything is tagged: namespace, under, and so on, and then converted to the Prometheus format - and becomes available for collection (except for logs). Also, we send logs, metrics and events to Kafka and further:

Logs are available at Graylog (for visual analysis);
Logs, metrics, events are sent to Clickhouse for long-term storage.

Likewise, everything works in AWS, only we are replacing Graylog from Kafka with Cloudwatch. We send logs there, and everything turns out very conveniently: it’s immediately clear to whom the cluster and container belong. The same is true for Google Stackdriver. That is, our scheme works both on-premise with Kafka, and in the cloud.

If we don’t have Kubernetes with pods, the scheme is a bit more complicated, but it works on the same principles.

The same processes are performed inside the container, they are orchestrated using S6. All the same processes are running inside the same container.

Eventually

We have created a complete solution for assembling and launching images into operation, with options for collecting and delivering logs and metrics:

Developed a standardized approach to the assembly of images, based on it developed CI-templates;
Data collection agents are our extensions to Telegraf. We ran them well in production;
We use mutation webhook to implement containers with agents in the pods;
Integrated into the Kubernetes / Rancher ecosystem;
We can execute the same containers in different orchestration systems and get the result we expect;
Created a fully dynamic container management configuration.

Co-author: Ilya Prudnikov

Production-ready images for k8s