👞 👩🏾‍🤝‍👨🏿 👸🏿 Debugging heavily loaded Golang applications or how we looked for a problem in Kubernetes that was not there 🍐 🙎🏽 🧑🏻‍🤝‍🧑🏻

In the modern world of Kubernetes clouds, one way or another, one has to face software errors that are not made by you or your colleague, but you will have to solve them. This article may help a newcomer to the world of Golang and Kubernetes understand some ways to debug their own and foreign software.

My name is Viktor Yagofarov, I am developing the Kubernetes cloud at DomKlik, and today I want to talk about how we solved the problem with one of the key components of our production k8s (Kubernetes) cluster.

In our combat cluster (at the time of writing):

1890 pods and 577 services were launched (the number of real microservices is also in the region of this figure)
Ingress- controllers serve about 6k RPS and about the same amount goes by Ingress directly to hostPort .

Problem

A few months ago, our pods began to experience a problem with resolving DNS names. The fact is that DNS works mainly over UDP, and in the Linux kernel there are some problems with conntrack and UDP. DNAT when accessing the service addresses of k8s Service only exacerbates the problem with conntrack races . It is worth adding that in our cluster at the time of the problem there were about 40k RPS towards DNS servers, CoreDNS.

It was decided to use the NodeLocal DNS (nodelocaldns) local caching DNS server specially created by the community on each worker node of the cluster, which is still in beta and is designed to solve all problems. In short: get rid of UDP when connecting to cluster DNS, remove NAT, add an additional cache layer.

In the first iteration of the implementation nodelocaldns we used version 1.15.4 (not to be confused with the version of the cube ), which came with «kubernetes-installer» Kubespray - we are talking about our company Fork fork from Southbridge.

Almost immediately after the introduction, problems started: memory flowed and the hearth restarted according to memory limits (OOM-Kill). At the time of restarting this, all the traffic on the host was lost, since in all pods /etc/resolv.conf pointed exactly to the IP address of nodelocaldns.

This situation definitely did not suit everyone, and our OPS team took a number of measures to eliminate it.

Since I myself am new to Golang, I was very interested to go all this way and get acquainted with debugging applications in this wonderful programming language.

We are looking for a solution

So let's go!

Version 1.15.7 was downloaded to the dev cluster , which is already considered beta, and not alpha as 1.15.4, but the maiden does not have such traffic in DNS (40k RPS). It’s sad.

In the process, we untied nodelocaldns from Kubespray and wrote a special Helm chart for more convenient rolling out. At the same time, they wrote a playbook for Kubespray, which allows you to change kubelet settings without digesting the entire cluster state by the hour; moreover, this can be done pointwise (checking first on a small number of nodes).

Next, we rolled out the version of nodelocaldns 1.15.7 to prod. The situation, alas, was repeated. The memory was flowing.

The official nodelocaldns repository had a version tagged with 1.15. 8, but for some reason I couldn’t make docker pull on this version and thought that I hadn’t yet collected the official Docker image, so this version should not be used. This is an important point, and we will return to it.

Debugging: Stage 1

For a long time I couldn’t figure out how to assemble my version of nodelocaldns in principle, since the Makefile from the turnip crashed with incomprehensible errors from within the docker image, and I didn’t really understand how to cunningly build a Go-project with a govendor laid out in strange ways in directories for several different DNS server options. The thing is that I started learning Go when normal out-of-box dependency versioning had already appeared .

Pavel Selivanov helped me a lot with the problem.pauljamm, for which many thanks to him. I managed to assemble my version.

Next, we screwed the pprof profiler , tested the assembly on the maiden and rolled it out into the prod.

A colleague from the Chat team was very helpful in sorting out the profiling so that you could conveniently cling to the pprof utility through the CLI URL and study the memory and process threads using the interactive menus in the browser, for which many thanks to him also.

At first glance, based on the profiler’s output, the process was doing well - most of the memory was allocated on the stack and, it seems, was constantly used by Go-routines .

But at some point it became clear that the “bad” hearths of nodelocaldns had too many threads active compared to the “healthy” ones. And the threads did not disappear anywhere, but continued to hang in memory. At this moment, Pavel Selivanov’s hunch that "threads are flowing" was confirmed.

Debugging: Stage 2

It became interesting why this is happening (threads are flowing), and the next stage in the study of the nodelocaldns process has begun.

Static analyzer code staticcheck showed that there are some problems just at the stage of creating a thread in the library , which is used in nodelocaldns (it inkluda CoreDNS, which inkluda nodelocaldns'om). As I understand it, in some places not a pointer to the structure is transmitted , but a copy of their values .

It was decided to make a coredump of the “bad” process using the gcore utility and see what was inside.

Stuck in coredump with gdb-like dlv toolI realized its power, but I realized that I would look for a reason in this way for a very long time. Therefore, I loaded coredump into the Goland IDE and analyzed the state of the process memory.

Debugging: Stage 3

It was very interesting to study the structure of the program, seeing the code that creates them. In about 10 minutes it became clear that many go-routines create some kind of structure for TCP connections, mark them false and never delete them (remember about 40k RPS?).

In the screenshots you can see the problematic part of the code and the structure that was not cleared when the UDP session was closed.

Also, from coredump, the culprit of such a number of RPS became known by IP addresses in these structures (thanks for helping to find a bottleneck in our cluster :).

Decision

During the fight against this problem, I found with the help of colleagues from the Kubernetes community that the official Docker image of nodelocaldns 1.15.8 still exists (and I actually have crooked hands and somehow did wrong docker pull, or WIFI was naughty in pull moment).

In this version, the versions of the libraries that he uses are greatly “upset”: specifically, the “culprit” “apnified” about 20 versions up!

Moreover, the new version already has support for profiling through pprof and is enabled through Configmap, you do not need to reassemble anything.

A new version was downloaded first in dev and then in prod.
III ... Victory !
The process began to return its memory to the system and the problems stopped.

In the graph below you can see the picture: “Smoker’s DNS vs. DNS of a healthy person. "

findings

The conclusion is simple: double-check what you are doing several times and do not disdain the help of the community. As a result, we spent more time on the problem for several days than we could, but we received DNS fail-safe operation in containers. Thank you for reading up to this point :)

Useful links:

1. www.freecodecamp.org/news/how-i-investigated-memory-leaks-in-go-using-pprof-on-a-large-codebase-4bec4325e192
2 . habr.com/en/company/roistat/blog/413175
3. rakyll.org

Debugging heavily loaded Golang applications or how we looked for a problem in Kubernetes that was not there