🐡 🗳️ 👨🏻‍🎓 Cisco UCS through the eyes of a cloud provider 👊🏽 🛷 ⛔️

Hello, Habr!

Being a cloud provider means constantly accumulating new knowledge and expertise. Over the years, we have formed a fairly large number of practices that we try to adhere to ensure the best level of service. One of them is the use of Cisco Unified Computing System solutions. Under the cut, I want to tell you why, in our opinion, UCS is one of the best solutions for providers, and discuss some features of the work and cases of using the system.

Almost 8 years have passed since Cisco UCS appeared on the market. This is quite a sufficient period for the audience to have a complete picture of the technology, and from the manuals, reviews and training articles, a voluminous book volume can be compiled. However, from the marketing articles on this topic, you get a two-volume. We will try to talk about Cisco UCS as objectively as possible: we will highlight the key features of the solution, based on them we will discuss the benefits for cloud service providers and share cases.

In the beginning was the word

The term “convergence” came to be used by HP engineers about 10 years ago. Actually, HP was the first to release the so-called converged modules for installation in the HP BladeSystem c7000 chassis. They made it possible, for example, to allocate a certain bandwidth to a specific blade server. This was the first step towards convergence, but this solution did not possess all the necessary features of converged systems.

Just in case, let’s explain: convergent infrastructure is a single complex of equipment and software with a single entry point for managing all the equipment included in the complex, plus an orchestra.

As for Cisco UCS, this solution is already fully consistent with the definition of convergence in terms of equipment and part of the software package.

Solution Architecture

We carefully study the above scheme and give a brief description of the elements of the complex "top-down".

Cisco UCS Manager Software A

single entry point for managing all of the hardware components shown in the diagram, and an orchestrator that allows you to manage components manually or through the REST API. This is a kind of "brain" of the complex. It is installed inside Fabric Interconnect. Without exception, all equipment settings are performed through the management interface (GUI or CLI) or the UCS Manager API.

Fabric interconnect

A hardware unified switch based on Cisco Nexus. It provides network connectivity of all components of the complex, as well as connectivity of blade-servers with external networks. The complex includes two Fabric Interconnect. In the latest versions - FI6332 and FI6454 - it is possible to connect up to 20 chassis 5108, and the total number of blade servers in this case reaches:

b480 M5 - up to 80 servers;
b200 M5 - up to 160 servers.

Today this is almost the only solution that provides integration opportunities within a single entry point and supports seamless network connectivity without the use of additional ToR switches or other modules installed in the chassis along with blade servers.

Chassis c5108

Compared to FI, these are fairly simple devices. Their layout is standard for blade systems: PSUs, fans, as well as a key component of the chassis - FEX modules, which provide connectivity between blade servers and FI. At the time of writing, both 4-port 40GbE-modules 2304 and 8- or 4-port 10GbE-modules 2204 are supported. Their distinctive feature is the ability to group ports, which allows to increase the overall bandwidth.

VIC (Virtual Interface Card)

Intelligent adapter installed in the blade server. Allows you to allocate virtual network resources for both hardware servers and virtual machines. Supports Eth and FC / FCoE data transfer protocols.

Now that the solution device is more or less clear, let's talk about why, in our subjective view, Cisco UCS is one of the most convenient solutions on the market.

Why Cisco UCS

Now that we have a clear idea of what the solution consists of, let's talk about its advantages. How is the Cisco solution better than its “relatives” - for example, the same HP Synergy? This question is often asked by our colleagues, although the answer (as it seems to us) lies on the surface. The point is this:

universal solution, compliance with the term "unified" ⇒ OPEX decline;
the minimum amount of equipment allows you to close the maximum number of cases (more about them below), as well as the ease of scaling ⇒ lower CAPEX;
Excellent performance and load balancing, Enterprise-level availability.

De facto, in these three points all the main requirements for the solution are concentrated, both from the business and from the IT side. Nevertheless, without cases these advantages look somewhat unfounded, therefore we will further “decipher” them, giving real examples.

Practical application

As promised at the beginning of this article, in this section we will look at Cisco UCS case studies. We start with a review of our experience and smoothly move on to specific situations.

Commissioning of equipment

During the time that we use Cisco UCS solutions, we had to commission and expand online 8 systems (a complex means a pair of Fabric Interconnect and at least one blade chassis), in total - 16 FI or more. The very first complex we put into operation in 2014, having at that time minimal practical experience. This process took us three days, two of which were spent on studying the documentation and understanding the logic of the equipment. Note that the documentation from Cisco is framed at the level of IBM’s best RedBooks - those who are familiar will understand the comparison.

Having dealt with the logic and basic principles of configuration, we easily assembled and launched the equipment. Then, we updated the firmware of all components, set up server profile templates, and created profiles. In just one business day.

Further implementation was carried out as part of standard ITIL change management procedures and took no more than four hours to deploy each pair of FIs and one or two chassis from the moment of power-on until the chassis was fully ready for use, including the creation and configuration of all necessary templates and policies.

Using the REST API and PowerTools modules can speed up the installation process. For example, copying a 500+ VLAN to a new installation is done in just two simple steps using PowerTools:

Fetching the VLAN list from the production infrastructure
uploading the VLAN list to the new complex.

Infrastructure scaling is performed by connecting new chassis with blade servers to the installed FI pair (if there are the required number of free ports). The procedure is 100% online and can be performed from the Cisco UCS Manager interface. With the correct global settings, immediately after the FI ports that the chassis is connected to are switched to the desired operating mode, these ports are automatically collected in the Port-channel. Next, an automated acknowledge procedure is launched, within the framework of which:

updating all chassis components to the current version of FW;
Power Cap setting;
mapping Backplane ports on FEX to the necessary factories and port-channel assemblies for these very ports.

Once again, we recall that all this is done without the intervention of engineers, based on global policies set during the commissioning of the complex.

In time, such a procedure takes about an hour and a half. The physical connection of the new chassis to FI consists in switching FEX to ports on FI, while optimally using DAC cables. And it is not at all necessary to take original cables from Cisco.

Exploitation

How much of this sound ... You can talk a lot about it, and not only good. As they say, without a barrel of tar a spoon of honey will not be so tasty. But seriously, all the routine procedures that take many minutes or hours in a typical infrastructure are performed automatically from the GUI. For example, to spill a new VLAN on all the blade servers of the complex (and I recall that 80 to 160 pieces are supported), just add it to the vNIC template in the LAN Policy section - the new VLAN will automatically spill on all the blade servers, in the profiles of which this vNIC template is present.

Since we are talking about Policy, it’s worth saying that literally all the settings are set through policies. You can, of course, not use them, but it will be ... ahem, too tough. All network settings for blade servers, including MAC and IP addresses for KVM, Flow Control, LACP, CDP, VMQ, are configured through policies. The BIOS settings, the FW version that will be uploaded to the blade server, Power Control, IPMI access settings and much more are determined in the same way.
Here is another example that introduces UCS's ability to automate routine operations, such as FC zoning settings.

In the settings of the Storage Connection Policy, it is enough to select the desired zoning type and set it, for example, as “single initiator single target”. In this case, when binding the blade server to the profile template, a separate zone will be created. This zone will automatically include the specified WWN target and WWPN of the virtual HBA from the desired port belonging to the desired factory.

Policies are tied to template profiles for servers. Then everything is simple: binding the template to the desired blade server and initialization. The output is a server ready for installation of the operating system. Server initialization takes no more than 10-20 minutes, and it can be performed simultaneously for the desired number of servers. In total, in just 25-35 minutes we get from 80 to 160 servers that are completely ready for OS installation. Of course, the installation process can also be automated, and the Cisco UCS API can help you in this task, but this is a topic for another article.

Total:to deploy a complex of a pair of FI, 20 blade chassis and 160 b200 M5 blade servers from scratch until ready to install the OS, one engineer will spend no more than 8 hours, and most of the time, about 3 hours, will be spent on creating policies and profile templates . The remaining time can be devoted to much more important matters, waiting for the initialization of the chassis and blade-servers after binding the templates to the latter. The indicated deployment time fits nicely into the OPEX cost reduction paradigm mentioned above.

Unified system

Versatility, versatility, and once again universality - probably, this is how you can voice the motto of the complex. We illustrate this thesis with a list of Cisco UCS features that make it unique in the market even after 8 years. By today's standards, this is a very long time.

unified 10GbE/16Gbit FC, 40 GbE ( 4x10 GbE breakout);
Fibre Channel, Ethernet FCoE FI;
FC Fabric FI, FC Brocade NPV;
FI rack- Cisco extender', UCS Manager;
FI rack- , L2 ;
FI Eth FC.

Of course, third-party equipment management capabilities are not included in the UCS Manager — Cisco has other tools for this — but the ones listed above are already impressive. Here are some examples where unified features have served us well:

Temporary replacement for Cisco Nexus switches The

delivery of new Cisco Nexus switches has been significantly delayed. The new NetApp storage arrived before them and could have been idle for several months: there were not enough 10GbE ports for a full fail-safe connection. Solution: connect the storage via FI ports configured in Appliance mode, configure the port-channel with LACP support, put the storage into operation a few months earlier than the switches arrive. Equipment is operating, generating revenue, CAPEX is declining.

Migration to a new storage system

Our customer needed to migrate data from the old EMC storage system to NetApp storage with minimal losses. There are no free ports in his old FC factory; there is no way to connect FI to a common factory. But there were free ports on the customer’s storage. We connect them to FI, we lift them on FC vSAN. We start the migration of virtual machines through Storage vMotion to NetApp connected via NFS. Everything is in order, everyone is happy. Migration completed successfully.

Cisco UCS and Virtualization

One cannot fail to mention a number of advantages provided by the UCS architecture, for example, for a virtual infrastructure running VMware. VIC adapters, which we already spoke about when describing components, are physically switched through the Midplane chassis with IO modules through the Backplane ports. From 2 to 4 ports can come to one VIC, which are automatically configured in EtherChannel ports at the UCS Manager level. This allows you to get the following benefits of network connections:

at the physics level, we get a fail-safe connection between the blade server and the IO module at the FI level. At least one Backplane port is supplied from Fabric A and one from Fabric B.
FI . EtherChannel Backplane NIC Teaming , , FI. active-active .
256 PCIe virtual devices (vNIC vHBA) VIC. VIC « » Service Profile Template, . vNIC vHBA .
VM-FEX support, with which you can organize Pass Through Switching between VM and FI using VMWare Direct Path IO technology.

As you can see, the Cisco UCS complex has really proven itself in a number of tasks and cases. On the one hand, it is a well-documented and time-tested solution. On the other hand, it does not lose its relevance and ideally closes in its part all the tasks of a cloud provider. If you have additions to the article or want to share your own experience, we are waiting for you in the comments.

Cisco UCS through the eyes of a cloud provider