A practical guide to dealing with memory leaks in Node.js

Memory leaks are similar to entities parasitic on an application. They quietly penetrate into the system, at first without causing any harm. But if the leak turns out to be strong enough, it can bring the application to disaster. For example - to slow it down strongly or simply to “kill” it. The author of the article, the translation of which we are publishing today, suggests talking about memory leaks in JavaScript. In particular, we will talk about memory management in JavaScript, how to identify memory leaks in real applications, and how to deal with memory leaks.





What is a memory leak?


A memory leak is, in a broad sense, a piece of memory allocated to an application that this application no longer needs, but cannot be returned to the operating system for future use. In other words, it is a memory block that is captured by the application without the intention of using this memory in the future.

Memory management


Memory management is a mechanism for allocating system memory to an application that needs it, and a mechanism for returning unnecessary memory to the operating system. There are many approaches to memory management. Which approach is used depends on the programming language used. Here is an overview of several common approaches to memory management:

  • . . . , . C C++. , , malloc free, .
  • . , , , . , , , . , , , , . . — JavaScript, , JVM (Java, Scala, Kotlin), Golang, Python, Ruby .
  • Application of the concept of ownership of memory. With this approach, each variable should have its own owner. As soon as the owner is out of scope, the value in the variable is destroyed, freeing memory. This idea is used in Rust.

There are other approaches to memory management used in different programming languages. For example, C ++ 11 uses the RAII idiom , while Swift uses the ARC mechanism . But talking about it is beyond the scope of this article. In order to compare the above methods of memory management, to understand their pros and cons, we need a separate article.

JavaScript, a language without which web programmers cannot imagine their work, uses the idea of ​​garbage collection. Therefore, we will talk more about how this mechanism works.

JavaScript garbage collection


As already mentioned, JavaScript is a language that uses the concept of garbage collection. During the operation of JS programs, a mechanism called a garbage collector is periodically launched. He finds out which parts of the allocated memory can be accessed from the application code. That is, which variables are referenced. If the garbage collector finds out that a piece of memory is no longer accessed from the application code, it frees this memory. The above approach can be implemented using two main algorithms. The first is the so-called Mark and Sweep algorithm. It is used in JavaScript. The second is Reference Counting. It is used in Python and PHP.


Phases Mark (marking) and Sweep (cleaning) of the Mark and Sweep

algorithm When implementing the marking algorithm, a list of root nodes represented by global environment variables (this is an object in the browserwindow) is created first, and then the resulting tree is crawled from root to leaf nodes marked with all met on the way objects. Memory on the heap that is occupied by unlabeled objects is freed.

Memory leaks in Node.js applications


To date, we have analyzed enough theoretical concepts related to memory leaks and garbage collection. So - we are ready to look at how it all looks in real applications. In this section, we will write a Node.js server that has a memory leak. We will try to identify this leak using various tools, and then we will eliminate it.

▍ Familiarity with a code that has a memory leak


For demonstration purposes, I wrote an Express server that has a memory leak route. We will debug this server.

const express = require('express')

const app = express();
const port = 3000;

const leaks = [];

app.get('/bloatMyServer', (req, res) => {
  const redundantObj = {
    memory: "leaked",
    joke: "meta"
  };

  [...Array(10000)].map(i => leaks.push(redundantObj));

  res.status(200).send({size: leaks.length})
});

app.listen(port, () => console.log(`Example app listening on port ${port}!`));

There is an array leaksthat is outside the scope of the API request processing code. As a result, every time the corresponding code is executed, new elements are simply added to the array. The array is never cleared. Since the link to this array does not disappear after exiting the request handler, the garbage collector never frees the memory it uses.

▍Call memory leak


Here we come to the most interesting. Many articles have been written about how, using node --inspect, to debug server memory leaks, after filling up the server with requests using something like artillery . But this approach has one important drawback. Imagine you have an API server that has thousands of endpoints. Each of them takes a lot of parameters, the particular code that will be called depends on the features of which. As a result, in real conditions, if the developer does not know where the memory leak lies, he will have to access each API many times using all possible combinations of parameters to fill the memory. As for me, it is not easy to do so. The solution to this problem, however, is facilitated by using something likegoreplay - a system that allows you to record and "play" real traffic.

In order to cope with our problem, we are going to do debugging in production. That is, we will allow our server to overflow memory during its actual use (as it receives a variety of API requests). And after we find a suspicious increase in the amount of memory allocated to it, we will do debugging.

▍ Heap Dump


In order to understand what a heap dump is, we first need to find out the meaning of the concept of a heap. If you describe this concept as simply as possible, it turns out that the heap is the place where everything that the memory is allocated falls into. All this is on the heap until the garbage collector removes from it everything that is deemed unnecessary. A heap dump is a bit of a snapshot of the current state of the heap. The dump contains all internal variables and variables declared by the programmer. It represents all the memory allocated on the heap at the time the dump was received.

As a result, if we could somehow compare the heap dump of the server just started with the dump of the server heap, which has been running for a long time and overflowing memory, then we could identify suspicious objects that the application does not need, but are not deleted by the garbage collector.

Before continuing the conversation, let's talk about how to create heap dumps. To solve this problem, we will use the npm package heapdump , which allows you to programmatically obtain a dump of the server heap.

Install the package:

npm i heapdump

We’ll make some changes to the server code that will allow us to use this package:

const express = require('express');
const heapdump = require("heapdump");

const app = express();
const port = 3000;

const leaks = [];

app.get('/bloatMyServer', (req, res) => {
  const redundantObj = {
    memory: "leaked",
    joke: "meta"
  };

  [...Array(10000)].map(i => leaks.push(redundantObj));

  res.status(200).send({size: leaks.length})
});

app.get('/heapdump', (req, res) => {
  heapdump.writeSnapshot(`heapDump-${Date.now()}.heapsnapshot`, (err, filename) => {
    console.log("Heap dump of a bloated server written to", filename);

    res.status(200).send({msg: "successfully took a heap dump"})
  });
});

app.listen(port, () => {
  heapdump.writeSnapshot(`heapDumpAtServerStart.heapsnapshot`, (err, filename) => {
    console.log("Heap dump of a fresh server written to", filename);
  });
});

Here we used this package to dump a freshly launched server. We also created an API /heapdumpdesigned to create a heap when accessing it. We will turn to this API at the moment when we realize that the server began to consume too much memory.

If your server is running in a Kubernetes cluster, then you will not be able, without additional efforts, to turn to that very pod whose server is running in which consumes too much memory. In order to do this, you can use port forwarding . In addition, since you will not have access to the file system that you need to download dump files, it would be better to upload these files to external cloud storage (like S3).

▍ Memory leak detection


And now, the server is deployed. He has been working for several days. It receives a lot of requests (in our case, only requests of the same type) and we paid attention to the increase in the amount of memory consumed by the server. A memory leak can be detected using monitoring tools like Express Status Monitor , Clinic , Prometheus . After that, we call the API to dump the heap. This dump will contain all objects that the garbage collector could not delete.

Here's what the query looks like to create a dump:

curl --location --request GET 'http://localhost:3000/heapdump'

When a heap dump is created, the garbage collector is forced to run. As a result, we do not need to worry about those objects that may be removed by the garbage collector in the future, but are still on the heap. That is - about the objects when working with which memory leaks do not occur.

After we have both dumps at our disposal (a dump of a freshly launched server and a dump of a server that has worked for some time), we can begin to compare them.

Getting a memory dump is a blocking operation that takes a lot of memory to complete. Therefore, it must be carried out with caution. You can read more about the possible problems encountered during this operation here .

Launch Chrome and press the key.F12. This will lead to the discovery of developer tools. Here you need to go to the tab Memoryand load both snapshots of memory.


Downloading memory dumps on the Memory tab of the Chrome developer tools

After downloading both snapshots, you need to changeperspectivetoComparisonand click on the snapshot of the memory of the server that worked for some time.


Start comparing snapshots

Here we can analyze the columnConstructorand look for objects that the garbage collector cannot remove. Most of these objects will be represented by internal links that nodes use. Here it is useful to use one trick, which consists in sorting the list by fieldAlloc. Size. This will quickly find the objects that use the most memory. If you expand the block(array), and then -(object elements), you can see an arrayleakscontaining a huge number of objects that cannot be deleted using the garbage collector.


Analysis of a suspicious array

This technique will allow us to go to the arrayleaksand understand that it is the incorrect operation with it that causes a memory leak.

▍Fix memory leak


Now that we know that the "culprit" is an array leaks, we can analyze the code and find out that the problem is that the array is declared outside the request handler. As a result, it turns out that the link to it is never deleted. To fix this problem is quite simple - just transfer the declaration of the array to the handler:

app.get('/bloatMyServer', (req, res) => {
  const redundantObj = {
    memory: "leaked",
    joke: "meta"
  };

  const leaks = [];

  [...Array(10000)].map(i => leaks.push(redundantObj));

  res.status(200).send({size: leaks.length})
});

In order to verify the effectiveness of the measures taken, it is enough to repeat the above steps and compare the heap images again.

Summary


Memory leaks happen in different languages. In particular, in - those that use garbage collection mechanisms. For example, in JavaScript. It is usually not difficult to fix a leak - the real difficulties arise only when you search for it.

In this article, you familiarized yourself with the basics of memory management, and how memory management is organized in different languages. Here we reproduced a real scenario of a memory leak and described a method for troubleshooting.

Dear readers! Have you encountered memory leaks in your web projects?


All Articles