Clustering in Proxmox VE



In past articles, we started talking about what Proxmox VE is and how it works. Today we will talk about how you can use the clustering feature and show what advantages it gives.

What is a cluster and why is it needed? A cluster (from the English cluster) is a group of servers connected by high-speed communication channels, working and representing the user as a whole. There are several basic scenarios for using a cluster:

  • Providing fault tolerance (High-availability).
  • Load balancing (Load Balancing).
  • Increased Productivity (High Performance).
  • Performing Distributed Computing

Each scenario presents its own requirements for cluster components. For example, for a cluster performing distributed computing, the main requirement is a high speed of floating point operations and low latency of the network. Such clusters are often used for research purposes.

Since we touched on the topic of distributed computing, I want to note that there is still such a thing as a grid system(from the English. grid - lattice, network). Despite the general similarity, do not confuse the grid system and the cluster. Grid is not a cluster in the usual sense. In contrast to the cluster, the nodes included in the grid are most often heterogeneous and are characterized by low availability. This approach simplifies the solution of distributed computing problems, but does not allow creating a single whole from nodes.
A striking example of a grid system is the popular computing platform BOINC (Berkeley Open Infrastructure for Network Computing). This platform was originally created for the SETI @ home project (Search for Extra-Terrestrial Intelligence at Home), which deals with the problem of searching for extraterrestrial intelligence by analyzing radio signals.
How it works
, , , - ( SETI@home ). SETI. , .

Now that we have a clear understanding of what a cluster is, we suggest considering how it can be created and deployed. We will use the Proxmox VE open source virtualization system .

Before starting to create a cluster, it is especially important to clearly understand the limitations and system requirements of Proxmox, namely:

  • the maximum number of nodes in a cluster is 32 ;
  • all nodes must have the same version of Proxmox (there are exceptions, but they are not recommended for production);
  • if in the future it is planned to use the High Availability functionality, then the cluster must have at least 3 nodes ;
  • for the nodes to communicate with each other, the ports UDP / 5404 , UDP / 5405 for corosync and TCP / 22 for SSH must be open ;
  • the network delay between nodes should not exceed 2 ms .

Cluster Creation


Important! The configuration below is a test one. Be sure to check the official Proxmox VE documentation .

In order to start the test cluster, we took three servers with the Proxmox hypervisor installed of the same configuration (2 cores, 2 GB of RAM).
If you want to know how to install Proxmox, we recommend that you read our previous article - The Magic of Virtualization: An Introductory Course in Proxmox VE .
Initially, after installing the OS, a single server runs in Standalone-mode .


Create a cluster by clicking the Create Cluster button in the corresponding section.


We name the future cluster and select the active network connection.


Click the Create button. The server will generate a 2048-bit key and write it together with the parameters of the new cluster to the configuration files.


The inscription TASK OK indicates a successful operation. Now, looking at the general information about the system, it is clear that the server has switched to cluster mode. So far, the cluster consists of only one node, that is, while it does not have the capabilities for which the cluster is needed.


Joining a cluster


Before connecting to the created cluster, we need to get information to complete the connection. To do this, go to the Cluster section and click the Join Information button .


In the window that opens, we are interested in the contents of the field of the same name. It will need to be copied.


All necessary connection parameters are encoded here: server address for connection and fingerprint. We pass to the server which needs to be included in a cluster. Click the Join Cluster button and in the window that opens, paste the copied content.


The Peer Address and Fingerprint fields will be filled in automatically. Enter the root password from node number 1, select the network connection and click the Join button .


When you join a cluster, the GUI web page may stop updating. This is normal, just reload the page. In exactly the same way we add one more node and as a result we receive a full-fledged cluster of 3 working nodes.


Now we can control all cluster nodes from one GUI.


High Availability Organization


Proxmox out of the box supports HA organization functionality for both virtual machines and LXC containers. The ha-manager utility detects and processes errors and failures by performing a failover from a failed node to a working one. For the mechanism to work correctly, it is necessary that the virtual machines and containers have a common file storage.

After the High Availability functionality is activated, the ha-manager software stack will begin to continuously monitor the status of the virtual machine or container and asynchronously interact with other nodes in the cluster.

Attach shared storage


For example, we deployed a small NFS file storage at 192.168.88.18. So that all the nodes of the cluster can use it, you need to do the following manipulations.

We select in the menu of the web interface Datacenter - Storage - Add - NFS .


Fill in the ID and Server fields . In the Export drop-down list, select the desired directory from the available ones and in the Content list , the necessary data types. After clicking the Add button, the storage will be connected to all nodes of the cluster.


When creating virtual machines and containers on any of the nodes, we specify our storage as storage.

Customize HA


For example, create a container with Ubuntu 18.04 and configure High Availability for it. After creating and launching the container, go to the Datacenter - HA - Add section . In the field that opens, specify the ID of the virtual machine / container and the maximum number of attempts to restart and move between nodes.
If this amount is exceeded, the hypervisor will mark the VM as failed and will transfer it to the Error state, after which it will cease to perform any actions with it.

After clicking the Add button, the ha-manager utility will notify all the cluster nodes that the VM with the specified ID is now being monitored and if it falls, it must be restarted on another node.


Make a crash


In order to see how the switching mechanism works, let’s pay off abnormally node1 power supply. We look from another node what happens to the cluster. We see that the system recorded a failure.

The operation of the HA mechanism does not mean the continued operation of the VM. As soon as the node has "fallen", VM operation is temporarily stopped until the automatic restart on another node.
And here β€œmagic” begins - the cluster automatically reassigned the node to run our VM and within 120 seconds the work was automatically restored.


We extinguish node2 on nutrition. Let's see if the cluster can withstand and the VM will return to working state automatically.


Alas, as we see, we had a problem with the fact that the only surviving node has no more quorum, which automatically disables HA. We give the command to force quorum installation in the console.

pvecm expected 1


After 2 minutes, the HA mechanism worked correctly and not finding node2, launched our VM on node3.


As soon as we turned node1 and node2 back on, the cluster was fully restored. Note that the VM does not migrate back to node1, but this can be done manually.

To summarize


We told you about how the clustering mechanism in Proxmox works, and also showed how HA is configured for virtual machines and containers. The proper use of clustering and HA greatly improves the reliability of the infrastructure, and also provides disaster recovery.

Before creating a cluster, you need to immediately plan for what purposes it will be used and how much it will need to be scaled in the future. You also need to check the network infrastructure for availability with minimal delays so that the future cluster works without failures.

Tell us - do you use clustering capabilities in Proxmox? See you in the comments.

Previous articles on the Proxmox VE hypervisor:


All Articles