HiDC solution for building a modern ICT infrastructure of data centers based on Huawei Enterprise equipment

With a bird's eye view of all modern Huawei Enterprise solutions introduced in 2020, we move on to more specific and detailed stories about individual ideas and products that can serve as the basis for the digital transformation of both large enterprises and government agencies. Today - about what concepts and technologies Huawei proposes to build data centers on the basis of.



In the era of the “connected world”, data storage and processing tasks require new approaches at all stages of the data center life cycle. They must simultaneously become both simpler and “smarter” in order to cope with the role assigned to them as the central elements of the infrastructure of the global digital economy.

In 2018, humanity stored 33 zettabytes of information, but by 2025 its total volume should grow by more than five times. Three decades of experience in the development of ICT infrastructures allowed Huawei to prepare well for the growing “data tsunami” and offer its partners and customers the concept of an intelligent data center, including all stages of its construction, operation and maintenance. Elements of this concept are combined under the general name HiDC.



Digitalize it


A fresh joke walks around the web: who has accelerated the digital transformation of your company the most - CEO, CTO, board of directors? The coronavirus epidemic! Only the lazy does not hold webinars, does not write articles, does not tell how and what to do. But all these are reactive actions. Some prepared in advance.

Not for the sake of bragging - for objective reasons, let’s cite as an example our company, in which the digital transformation was launched on a large scale several years ago. Currently, we are able to transfer almost all employees to work from home without any loss of efficiency. The story with a hospital built in Wuhan in ten days is indicative. There, the digital transformation proved itself in the fact that all IT systems were deployed in three days. So digital transformation is not about “when” and “why,” but about “how.”



Architectural approach instead of spontaneous development


What are the main problems that confront us when we begin to build a certain system? Until now, all our customers are working in a mode of combining business tasks with application services and IT solutions. It is quite difficult to get a general idea of ​​the functioning of such a complex if it was created simply by adding various blocks. And in order to build a system as a single organism, first of all, an architectural approach is needed. We embodied it in the ideology of our HiDC solution.



Maximum value and minimum value


The entire HiDC structure is two main slices. The first is what you are used to seeing with Huawei, a classic infrastructure. The elements of the second slice are most easily combined by the term “smart data”.

Why is this needed? Nowadays, many companies accumulate enormous amounts of information, often scattered or accessible through various kinds of "laying". Yes, take at least regular databases. Ask your database administrators how these databases are interconnected and how to use the information from them in BI systems for making business decisions. Surprisingly, DBs are often very weakly connected with each other and function as separate “islands”. Therefore, first of all, we thought about what architectural approaches this problem could be eliminated.



HiDC Architecture Design Principles


Consider the basic design principles of HiDC. First of all, it will be useful not to specialists of any particular direction, but to architects of solutions that can cover the whole panorama with a look.

The most common are converged network units and data management units. And already here a concept arises that solution architects rarely think about - data lifecycle management. From classical databases, he migrated to many other systems, including cloud and boundary (edge) computing.

. — , . , «» — , . ( — ).


It’s great when we have at our disposal all six blocks of the HiDC structure. However, often customers work in a previously created environment. However, using even one block from the circuit above can bear fruit. And if you add a second, third and so on, a synergistic effect will begin to appear. The combination of a network and a distributed storage system alone will give higher performance and lower latency. The block approach allows us to develop not randomly, as often happens in the industry, but using an integrated architectural approach. Well, the openness of the blocks themselves provides freedom in choosing the optimal solution.



Converged Network Times


Recently, in the world and Russian markets, we are increasingly zealously promoting the concept of converged networks. Already today, our customers are using converged solutions on RoCEv2 (RDMA over Converged Ethernet v2) to build distributed software-defined storage systems. The main advantage of this approach is its openness and the absence of the need to create an indefinite number of disparate networks.

Why haven’t you done this before? Recall that the Ethernet standard was developed in 1969. For half a century, it has accumulated a lot of problems, but Huawei learned to solve them. Now, thanks to a number of additional steps, we can use Ethernet for mission-critical applications, highly loaded solutions, etc.



From DCN to DCI


The next important trend is the synergistic effect of the introduction of DCI (Data Center Interconnect). In Russia, unlike China, something similar so far can only be found among telecom operators. When customers consider network solutions for the data center, they usually do not pay due attention to the deep integration of optical networks and classic IP solutions within a single point of presence. They use familiar solutions that work on the IP layer, which is enough for them.

Why then do you need DCI? Imagine that the DWDM host administrator and network administrator act independently. At some point, the failure that occurred to any of them can seriously reduce your fault tolerance. And if we use the principle of synergy, IP routing is carried out taking into account what is happening on the optical network. The use of such an intelligent service significantly increases the number of nines in the level of availability of the entire system.

Another major advantage of our DCI is its large performance margin. Summing up the possibilities of the ranges C and L, you can get about 220 lambdas. Such a reserve is unlikely to be quickly exhausted even by a large corporate customer, given that our current solution allows transferring up to 400 Gbit / s through each lambda. In the future, on the same equipment it will be possible to achieve 800 Gb / s.

Additional convenience is provided by the general controllability that we provide through classically open interfaces. NETCONF manages not only the switches, but also the optical compaction devices, which allows convergence to be achieved at all levels and to perceive the system as an intellectual resource rather than a “set of boxes”.



Edge computing is more important


Many have heard about Edge Computing. And those involved in cloud and classic data centers, it should be borne in mind that we have recently seen a serious shift in the direction of boundary computing.

What caused this? Let's look at general implementation models. Now they talk a lot about “smart cities”, “smart homes”, etc. This concept allows the developer to create added value and increase the price of the property. "Smart Home" identifies its resident, lets in and out, provides him with some services. According to statistics, such services add about 10-15% to the price of apartments and in general are able to push the development of new business models. Also, earlier it was said about the concepts of autopilots. Soon, the development of 5G and Wi-Fi 6 technologies will ensure an extremely low latency in the transfer of data between smart homes, cars, and the main data center that produce border computing. This means that it will be possible to perform a much larger number of operations related to serious data processing. To solve such problems,in particular, you can use neural processors that are already shipped to Russia.

The prospects for the trend just outlined are undeniable. Imagine, for example, an intelligent urban transport management system that can switch traffic lights, regulate traffic loads on specific streets, or even take adequate measures during emergency situations.



Now let's turn to the resources with which we provide the implementation of the HiDC concept.

Calculations


When we need to implement a standard computing system, processors with x86 architecture are, of course, using it. But as soon as the need arises for customization, it’s time to think about more diverse solutions.

So, for example, ARM processors, due to the large number of cores, are excellent for applications with a high degree of parallelism. Multithreading gives a gain in productivity of about 30%.

If low latency is critical to us, FPGAs come to the fore.

Neural processors are primarily needed when solving machine learning problems. If for a specific implementation we need 16 racks with 8 servers each packed with neuroprocessors, then a solution of the same level based on the x86 architecture would require (!) About 128 racks. As you can see, a wide variety of types of calculations makes you carefully choose hardware platforms.



Data storage


For the second year, Huawei has been calling partners, customers, industry colleagues to build storage systems in accordance with the principle of Flash Only. And most of our customers use mechanical spindle drives only in old solutions or for rarely used archive data.

Flash systems are evolving too. Storage Class Memory (SCM) systems such as Intel Optane are coming onto the market. Interesting developments are demonstrated by Chinese and Japanese manufacturers. Currently, SCM is superior in processing class to all other solutions. So far, only the high cost does not allow them to be used everywhere.

At the same time, we see that the quality of storage must be improved not only on the conditional backend, but also on the front-end. Now, de facto, in new implementations, we usually offer and use mechanisms for direct access to memory via Ethernet, but we see a request from customers and therefore closer to the end of the year we will begin to use NVMe over Fabrics more often. Moreover, end-to-end, to provide a common architecture, which, of course, must be high-performance and resistant to the failure of controllers.

OceanStor Dorado Storage is one of our flagship products. Internal tests have shown that it provides performance at the level of 20 million IOPS, while maintaining operability in case of failure of seven of the eight controllers.

Why so much power? Let's look at the current situation. For several months now, Chinese people have spent much more time at home in connection with the isolation regime. Internet traffic at this time increased by an average of 30%, and in some provinces, even doubled. The consumption of a variety of network services has increased. And at some point, the same banks began to experience a serious additional load, for which their storage systems were not ready.

It is clear that not everyone needs 20 million IOPS now. But what will happen tomorrow? Our intelligent systems maximize the potential of neural processors in order to ensure traffic compactness, deduplication, optimization and rapid data recovery.

Core network


2020, as we mentioned in the previous article, will be the year of core networks for us. Many customers, especially application service providers (ASPs) and banks, are already thinking about how their applications will work precisely in terms of communication with data centers and between data centers. Here a new core network comes to the rescue. As an example, let’s take the largest Chinese banks that switched to simplified support systems that use not a dozen different protocols for communication between data centers, but, relatively speaking, a couple - OSPF and SRv6. Despite the fact that the organization receives the same set of services.



Intellectual resources


How to use the data? Until recently, there was a fragmented system of disparate databases: Microsoft SQL, MySQL, Oracle, etc. To work with them, solutions from the big data field were used that could combine this data, collect it, and work with them. All this created a high load on resources.

At the same time, there was no mechanism for performing operations with data upon the occurrence of an event. The solution was to develop data lifecycle management (DLM) principles.

Everyone has heard about data lakes. With the transition from data management to data governance, “digital lakes” began to rapidly “grow wiser”. Including thanks to Huawei solutions. In the following materials, we will tell you about the whole stack of software technologies that we used. Now it’s important to note that it was the use of “smart” data lifecycle management that allowed us to simplify the use of our network and servers, as well as learn how to build end-to-end architectures to better understand the principles of working with data.



Data Center Engineering Infrastructure


We will publish separate materials on engineering infrastructure, however, in the context of today's topic, we would like to mention those changes that are related to the HiDC concept.

For a long time, the use of lithium batteries in emergency and backup power supply systems (PSA) of the data center was banned due to their high fire hazard. Any mechanical damage or violation of the integrity of the battery could lead to fire and unpredictable consequences. In this regard, PSAs were equipped with obsolete acid batteries having a low specific charge density with a large mass.

Huawei’s new emergency and backup power systems utilize intelligent lithium iron phosphate (LFP) batteries with intelligent proactive control. At the same capacity, they take up three times less volume than acid batteries. Their life cycle is 10–15 years, which, among other things, reduces the environmental burden they create. The patented control system in the SmartLi ecosystem allows the use of hybrid systems consisting of arrays of old and new type of batteries, and the switching system allows for the introduction of “hot” changes to the PSA structure with continuous preservation of the backup function.



Smart operation


An important part of the principles of operating the HiDC infrastructure is the ideology of smart self-healing. In one of our past publications, we mentioned the O&M 1-3-5 intelligent platform, which is able to not only detect and analyze an undesirable event in the system, but also offer the administrator several options for a fully automated solution to the problem.

The introspection function allows you to detect problems in about a minute. Three minutes are spent on analysis, and within five minutes proposals are made to change the state of the system.

Suppose a certain operator error led to the formation of a closed loop of processes that reduces the performance of the virtualization farm from 100 to 77%. The data center administrator receives a message on his dashboard that contains a complete visualization of the problem, including a network diagram of the resources affected by the unwanted process. Further, the administrator can proceed to correct the situation manually or use one of several automatic recovery scenarios proposed to him.


The system knows about 75 such scenarios that can be implemented in less than ten minutes. Moreover, they cover 90% of the problems encountered in data centers. At this time, the engineer can calmly answer the calls of concerned customers, being confident that the service will be restored any minute.



New Key Products at HiDC


In addition to software products, this should include key solutions that operate at the infrastructure level. First of all, we need to mention the neural processors used in our Atlas AI clusters, as well as servers based on NPU and GPU.

In addition, you can not again mention the Dorado and its record performance in its class, which is enough for many years to come. This is especially true in the post-Soviet space, where, with rare exceptions, it is customary to update something only when it completely stops working. This explains the life of individual storage systems, reaching ten years. Enormous productivity is necessary for Dorado in order to provide high quality services in ten years.



Innovation in every element


When choosing specific infrastructure solutions, one should not forget about the architecture and scenarios of its further development. Disparate products from different manufacturers do not guarantee the expected synergistic effect that solutions already optimized for sharing will provide.

The infrastructure must be based on the right technology. "Correct" is including open, providing high throughput, stably functioning at high loads. For data centers, for example, a good ratio of total energy consumption to IT load is important. To achieve all these goals, you need to choose the environment and components. In modern conditions, this also means the ever wider use of artificial intelligence.
According to our observations, among strategic customers of Huawei, there are fewer who still do not use machine learning systems. Without ML, you simply cannot maximize the monetization of the accumulated data.
The monetization system can be different: for banks - the offer of new targeted products, for telecom operators - the provision of individual services and loyalty, for government customers - quality data life cycle management and a high level of interaction with other organizations. After all, data management models have not been reduced to configuring a firewall and ensuring the network visibility of their databases for a long time.

From an idea to an existing data center


Building a standard data center at best takes from a year and a half. Our production cycle allows us to do this much faster thanks to the use of a group of solutions, united under the general name FusionDC 2.0. Design, development of a high-level design, assembly of all elements of the IT load are carried out directly at the factory. In a short time, equipment by sea containers is delivered from China to Russia. As a result, the creation of a turnkey data center can be achieved in just four to five months.

The idea of ​​a prefabricated cloud data center is also interesting because it is possible to develop a data center in stages, adding the necessary functional blocks to it. This approach is embedded in the HiDC concept itself.


In order not to turn the survey material into a datasheet, for more information on HiDC, we suggest you go to our website . There you will find a description and examples of the implementation of the approaches, products and solutions that we talked about. The more materials, the higher your level of access to the site. If you have been assigned the “partner” status, you can download HiDC roadmaps, technical presentations, and videos.

We dare to assume that the majority of readers of this article have the competencies of network architects. They will certainly be interested in visiting our design zone.. There we talk in detail about how to build a network infrastructure according to the rules of Huawei Validated Design (HVD). Available for download guidelines will help to thoroughly understand how the company's solutions work. Do not forget only that without authorization you will be available less materials.

***


Numerous webinars, held not only in the Russian-language segment, but also at the international level, will help you find your way. We share them with both information about our products and our business practices. In particular, we talk about how Huawei, in the context of the breakdown of many service chains, continues to provide continuous delivery of its products to different countries. Recently, for example, there was a case when the newly manufactured equipment for the data center reached the Moscow customer in just three weeks.

The list of webinars for April is available here .

All Articles