A little more about improper testing

One day I accidentally caught the eye of a code that a user was trying to monitor the performance of RAM in his virtual machine. I won’t give this code (there is a “footcloth”) and leave only the most essential. So, the cat is in the studio!

#include <sys/time.h>
#include <string.h>
#include <iostream>

#define CNT 1024
#define SIZE (1024*1024)

int main() {
	struct timeval start;
	struct timeval end;
	long millis;
	double gbs;
	char ** buffers;
	buffers = new char*[CNT];
	for (int i=0;i<CNT;i++) {
		buffers[i] = new char[SIZE];
	}
	gettimeofday(&start, NULL);
	for (int i=0;i<CNT;i++) {
		memset(buffers[i], 0, SIZE);
	}
	gettimeofday(&end, NULL);
	millis = (end.tv_sec - start.tv_sec) * 1000 +
		(end.tv_usec - start.tv_usec) / 1000;
	gbs = 1000.0 / millis;
	std::cout << gbs << " GB/s\n";
	for (int i=0;i<CNT;i++) {
		delete buffers[i];
	}
	delete buffers;
	return 0;
}

Everything is simple - we allocate memory and write one gigabyte into it. And what does this test show?

$ ./memtest
4.06504 GB / s


Approximately 4GB / s.

What?!?!

How?!?!?

This is Core i7 (albeit not the newest), DDR4, the processor is almost not loaded - WHY?!?!

The answer, as always, is unusually ordinary.

The new operator (like the malloc function, by the way) does not actually allocate memory. With this call, the allocator looks at the list of free sections in the memory pool, and if there are none, calls sbrk () to enlarge the data segment, and then returns the program a link to the address from the newly selected section.

The problem is that the selected area is completely virtual. Real memory pages are not allocated.

And when the first access to each page from this selected segment occurs, the MMU “shoots” the page fault, after which the virtual page to which access is made is assigned to the real one.

Therefore, in fact, we are testing not the performance of the bus and RAM modules, but the performance of the MMU and VMM of the operating system. And in order to test the real performance of RAM, we just need to initialize the allocated sections once. For example, like this:

#include <sys/time.h>
#include <string.h>
#include <iostream>

#define CNT 1024
#define SIZE (1024*1024)

int main() {
	struct timeval start;
	struct timeval end;
	long millis;
	double gbs;
	char ** buffers;
	buffers = new char*[CNT];
	for (int i=0;i<CNT;i++) {
                // FIXED HERE!!!
		buffers[i] = new char[SIZE](); // Add brackets, &$# !!!
	}
	gettimeofday(&start, NULL);
	for (int i=0;i<CNT;i++) {
		memset(buffers[i], 0, SIZE);
	}
	gettimeofday(&end, NULL);
	millis = (end.tv_sec - start.tv_sec) * 1000 +
		(end.tv_usec - start.tv_usec) / 1000;
	gbs = 1000.0 / millis;
	std::cout << gbs << " GB/s\n";
	for (int i=0;i<CNT;i++) {
		delete buffers[i];
	}
	delete buffers;
	return 0;
}

That is, we simply initialize the allocated buffers with the default value (char 0).

Checking:

$ ./memtest
28.5714 GB / s


Another thing.

Moral - if you need large buffers to work quickly and quickly, do not forget to initialize them.

Source: https://habr.com/ru/post/undefined/


All Articles