Suddenly, a garbage collection system alone is not enough

Here's a short story about mysterious server failures that I had to debug a year ago (article dated December 05, 2018, approx.per). The servers worked fine for a while, and then at some point began to crash. After this, attempts to run almost any program that were on the servers failed with errors “There is no space on the device”, although the file system reported only a few occupied gigabytes on ~ 20 GB disks.

It turned out that the problem was caused by the logging system. This was a Ruby application that takes log files, sends data to a remote server, and deletes old files. The error was that open log files were not explicitly closed. Instead, the application allowed Ruby's automatic garbage collector to clean up File objects. The problem is that File objects do not use a lot of memory, so the logging system could theoretically keep millions of logs open before garbage collection is required.

* Nix file systems separate file names and data in files. Data on a disk can have several file names pointing to them (i.e. hard links), and data is deleted only when the last link is deleted. An open file descriptor is considered a link, so if the file is deleted while the program is reading, the file name disappears from the directory, but the file data remains alive until the program closes it. This is what happened to the logger. The du (“disk usage”) command searches for files using a directory listing, so it did not see gigabytes of file data for the thousands of log files that were still open. These files were discovered only after running lsof ("list open files").

Of course, a similar error occurs in other similar cases. A couple of months ago, I had to run into a Java application that broke down after a few days due to a leak in network connections.

I used to write most of my code in C, and then in C ++. In those days, I thought that manual resource management was enough. How complicated was that? Each malloc () needs the free () function, and each open () needs close (). Simply. Except that not all programs are simple, so manual resource management over time has become a straitjacket. Then one day I discovered link counting and garbage collection. I thought that it solves all my problems, and completely stopped caring about resource management. Again, for simple programs, this was normal, but not all programs are simple.

You cannot count on garbage collection, because it only solves the problem of memory management, and complex programs have to deal with much more than just memory. There is a popular meme that answers this with the fact that memory is 95% of resource problems . You could even say that all resources are 0% of your problems - until you run out of one of them. Then this resource becomes 100% of your problems.

But such thinking still perceives resources as a special case. A deeper problem is that as programs become more complex, everything tends to become a resource. For example, take a calendar program. The sophisticated calendar program allows multiple users to manage multiple shared calendars, and with events that can be shared across multiple calendars. Any part of the data will ultimately affect several parts of the program, and should be relevant and correct. Therefore, for all dynamic data, you need an owner, and not just for memory management. As new features are added, more and more parts of the program will need to be updated. If you are sane, you’ll only allow you to update data from one part of the program at a time,so that the right and responsibility to update data becomes in itself a limited resource. Modeling mutated data using immutable structures does not lead to the disappearance of these problems, but only translates them into another paradigm.

Planning ownership and lifetime of resources is an inevitable part of the design of complex software. This is easier if you use some common patterns. One of the patterns is interchangeable resources. An example is the immutable string “foo”, which is semantically the same as any other immutable “foo”. This type of resource does not need a predetermined life span or possession. In fact, in order to make the system as simple as possible, it is better not to have a predetermined lifetime or ownership (hi Rust, approx.per). Another pattern is resources that are not interchangeable, but have a determined life span. This includes network connections, as well as more abstract concepts, such as the right to control part of the data.The most reasonable thing is to explicitly ensure the lifespan of such things when encoding.

Note that automatic garbage collection is really good for implementing the first pattern, but not the second, while manual resource management techniques (such as RAII) are great for implementing the second pattern, but terrible for the first. These two approaches become complementary in complex programs.

Source: https://habr.com/ru/post/undefined/


All Articles