👦 🕺 🔟 Beyond Moore's Law 👧🏼 🧔🏽 👰🏽

Rumors of the death of Moore’s law went as far as I remember. I heard the argument that we are approaching the size of an atom and that soon the whole venture will become unprofitable, 30 and 20, and 10 years ago. Here are just engineers over and over again refuted them. It was engineering genius that made Moore's law one of the “self-fulfilling prophecies."

I’m not going to talk about whether technology has reached its limit or not. Despite the radiophysical education, I understand it very conditionally. Those wishing to delve into can advise you to consult a recent review . I will subscribe to the point of view of another very respected thinker, Bob Colwell .

Meanwhile, chip makers continue to build (well, or at least announce) new factories working on new technologies. So this is still beneficial. For me, “the patient is more likely alive than dead.” Murovskaya expansion will stop when a server with two processors manufactured using the new technology becomes more expensive than a server with 4 manufactured using the old one. And this is far from the case. I have worked with 4-head and even 8-head. But they are assembled to order and stand like a small plane.

My task today is to talk about how technology affects architecture and programming. About what awaits us "on the other side of the law of Moore." For many trends are obvious now. So.

The area (volume) of the crystal is worth its weight in gold. Transistors stop “shrinking”, and the chip size is limited . Accordingly, the number of elements has a limit. New fantasies are getting harder to shove onto a crystal. On the contrary, the price of compactness increases. Designers are far more concerned with optimization than innovation. Accordingly, we will see less and less innovation on the CPU or GPGPU chip. Perhaps even software will have to be rewritten less, although I don’t believe in the latter.

Discreteness. Since the size, functionality and power consumption of the chip are limited, let's stick as many chips as possible. Good and different (explosive growth of accelerators predicted by Colwell). Or the same (symmetric multi-processing). Or generally with reprogrammable logic (FPGA). Each of these scenarios has its own merits. The first gives maximum performance per watt for a specific task. The second is the ease of programming. The third is flexibility. What scenario is being implemented - time will decide. As I like to say, life will show everything and judge everyone. And it’s not long to wait.

Complication of NUMA: Single crystals die out, giving way to chiplets. Thus, manufacturers increase the yield of the product. By the way, yalda (yield) is the percentage of suitable chips, this is the worst secret of any chip maker. Especially in the early stages of the process. But such a “gluing” of a chip from pieces carries additional difficulties for programmers. The communication time between the cores inside the chiplet and outside is different. And this is just one example of an increasingly complex NUMA (Non-Uniform Memory Access) structure. The other is the topology of the connections inside the chip. (A more - High Bandwidth Memory. A more - discreteness. A - more ...) And all this will need to be taken into account.

The increasing role of uncore:Since we are talking about intraprocessor communications, I will mention another interesting trend. If you look closely at the M&A activity of market leaders, it is easy to understand that all the giants are doing the same thing. Intel invests in Silicon Photonics technology and buys Barefoot Networks . NVidia is responsible for the purchase of Mellanox . And not Infinibanda for one sake. Everyone understands that the field of the future battle is intra- and interprocessor connections. And who will become the “king of the hill” will not be determined by instruction sets or some kind of complex logic, but by buses and switches.

“Originality” (more precisely, non-repeatability):I sometimes have to work with large ensembles of chips. This happens when a new cluster for high performance computing is built and launched. And recently, I noticed one interesting thing. If earlier chips with the same labeling were almost indistinguishable, now each of them has its own “character” and “mood”. The processor has a built-in power management mechanism. It depends on how many cores are currently running, which blocks are involved, on temperature, etc., etc. And it seems that how the processor consumes and dissipates energy also depends on the production conditions of a particular batch, on its position in the rack and on the mass of other uncontrolled factors. As a result, I observed a frequency deviation (and performance) of ~ 15%. Of course, this leads to all kinds of imbalances (MPI, OpenMP).And how to deal with them is not very clear yet. Unless, to do distribution of work dynamic.

And the last one is the frequency: There will definitely not be any growth. For many reasons, including power consumption, size, etc. I would venture to suggest that the frequency in general should be lowered. In the most painless way for single thread performance (that is, improving the architecture). Here, of course, Linpack, beloved by all marketers, will suffer. But the system will become more balanced and the work of developers of iron will be facilitated. Well, in real applications, the fewer cycles the processor threshes, waiting for data from slow devices (memory, grid, disk) - the better.

This is how the computer world seems to me in the post-moor era.
How do you see him?

Beyond Moore's Law

More articles: