🚢 🚈 🗞️ GlusterFS as external storage for Kubernetes 〰️ 🏴󠁧󠁢󠁳󠁣󠁴󠁿 🏤

Finding the optimal storage is a rather complicated process, everything has its pros and cons. Of course, the leader in this category is CEPH, but it is a rather complex system, albeit with very rich functionality. For us, such a system is redundant, given that we needed a simple replicated storage in master-master mode for a couple of terabytes. Having studied a lot of material, it was decided to test the most fashionable product on the market for the circuit we are interested in. Due to the fact that no ready-made solution of such a plan was found, I would like to share my best practices on this topic and describe the problems that we encountered in the deployment process.

Goals

What did we expect from the new repository:

Ability to work with an even number of nodes for replication.
Easy installation, setup, support
The system must be adult, time-tested and users
Ability to expand storage space without downtime
Storage must be compatible with Kubernetes
There should be an automatic failover when one of the nodes crashes

It is on the last point that we have a lot of questions.

Deployment

For deployment, two virtual machines were created on CentOs 8. Each of them is connected via an additional disk with storage.

Preliminary preparation

For GlusterFS, you need to allocate a separate disk with XFS so that it does not affect the system in any way.

Select the partition:

$ fdisk /dev/sdb
Command (m for help): n
Partition type
   p   primary (0 primary, 0 extended, 4 free)
   e   extended (container for logical partitions)
Select (default p): p
Partition number (1-4, default 1):  1
First sector (2048-16777215, default 2048): 
Last sector, +sectors or +size{K,M,G,T,P} (2048-16777215, default 16777215): 
 
Created a new partition 1 of type ‘Linux’ and of size 8 GiB.
Command (m for help): w 

The partition table has been altered.
Calling ioctl() to re-read partition table. Syncing disks.

Format in XFS and mount:

$ mkfs.xfs /dev/sdb1
$ mkdir /gluster
$ mount /dev/sdb1 /gluster

And to top it off, drop the entry in / etc / fstab to automatically mount the directory at system startup:

/dev/sdb1       /gluster        xfs     defaults        0       0

Installation

Concerning the installation, many articles have been written, in this connection we will not go deep into the process, we will just consider what it is worth paying attention to.

On both nodes, install and run the latest version glusterfs:

$ wget -P /etc/yum.repos.d  https://download.gluster.org/pub/gluster/glusterfs/LATEST/CentOS/glusterfs-rhel8.repo
$ yum -y install yum-utils
$ yum-config-manager --enable PowerTools
$ yum install -y glusterfs-server
$ systemctl start glusterd

Next, you need to tell the glaster where his neighboring node is. It is done with only one node. An important point: if you have a domain network, then you must specify the server name with the domain, otherwise in the future you will have to redo everything.

$ gluster peer probe gluster-02.example.com

If it was successful, then we check the connection with the command from both servers:

$ gluster peer status
Number of Peers: 1

Hostname: gluster-02.example.com
Uuid: a6de3b23-ee31-4394-8bff-0bd97bd54f46
State: Peer in Cluster (Connected)
Other names:
10.10.6.72

Now you can create a Volume in which we will write.

gluster volume create main replica 2 gluster-01.example.com:/gluster/main gluster-02.example.com:/gluster/main force

Where:

main - name Volume
replica - type Volume (more details can be found in the official documentation )
2 - number of replicas

Run Volume and check its performance:

gluster volume start main
gluster volume status main

For replicated Volume, it is recommended that you set the following parameters:

$ gluster volume set main network.ping-timeout 5
$ gluster volume set main cluster.quorum-type fixed
$ gluster volume set main cluster.quorum-count 1
$ gluster volume set main performance.quick-read on

With these simple steps, we have built a GlusterFS cluster. It remains to connect to it and check the performance. Ubuntu is installed on the client machine, for mounting you need to install the client:

$ add-apt-repository ppa:gluster/glusterfs-7
$ apt install glusterfs-client
$ mkdir /gluster
$ mount.glusterfs gluster-01.example.com:/main /gluster

Gluster, when connected to one of the nodes, gives the addresses of all the nodes and automatically connects to all. If the client has already connected, the failure of one of the nodes will not lead to a halt. But if the first node is unavailable, it will not work to connect in the event of a session break. To do this, when mounting, you can pass the backupvolfile parameter indicating the second node.

mount.glusterfs gluster-01.example.com:/main /gluster -o backupvolfile-server=gluster-02.example.com

An important point: gluster synchronizes files between nodes only if their change was through the mounted volume. If you make changes directly on the nodes, the file will be out of sync.

Connect to Kubernetes

At this stage, the questions began: “How to connect it?”. And there are several options. Consider them.

Heketi

The most popular and recommended is to use an external service: heketi. heketi is a layer between kubernetes and gluster that allows you to manage and work with the repository via http. But heketi will be that single point of failure, because service is not clustered. The second instance of this service will not be able to work independently, because any changes are stored in the local database. Running this service in kubernetes is also not suitable, because he needs a static disk on which his database will be stored. In this regard, this option turned out to be the most inappropriate.

Endpoint at Kubernetes

If you have Kubernetes on systems with package managers, then this is a very convenient option. The point is that for all GlusteFS servers in Kubernetes, a common Endpoint is created. A service is hung on this Endpoint and we will already be mounted on this service. For this option to work, it is necessary to install glusterfs-client on each Kubernetes node and make sure that it can be mounted. In Kubernetes, deploy the following config:

apiVersion: v1
kind: Endpoints
metadata: 
  name: glusterfs-cluster
subsets:
  - addresses:
      #  ip  
      - ip: 10.10.6.71
    ports:
      #    1,    
      - port: 1
  - addresses:
      - ip: 10.10.6.72
    ports:
      - port: 1

---
apiVersion: v1
kind: Service
metadata:
  name: glusterfs-cluster
spec:
  ports:
  - port: 1

Now we can create a simple test deployment and check how the mounting works. Below is an example of a simple test deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: gluster-test
spec:
  replicas: 1
  selector:
    matchLabels:
      app: gluster-test
  template:
    metadata:
      labels:
        app: gluster-test
    spec:
      volumes:
      - name: gluster
        glusterfs:
          endpoints: glusterfs-cluster
          path: main
      containers:
      - name: gluster-test
        image: nginx
        volumeMounts:
        - name: gluster
          mountPath: /gluster

This option did not suit us, because we have container-linux on all Kubernetes nodes. The package manager is not there, so it was not possible to install gluster-client for mounting. In this regard, the third option was found, which it was decided to use.

GlusterFS + NFS + keepalived

Until recently, GlusterFS offered its own NFS server, but now NFS uses the external nfs-ganesha service. Quite a bit has been written about this, in connection with this we will figure out how to configure it.

The repository must be registered manually. To do this, in the file /etc/yum.repos.d/nfs-ganesha.repo we add:

[nfs-ganesha]
name=nfs-ganesha
baseurl=https://download.nfs-ganesha.org/2.8/2.8.0/RHEL/el-8/$basearch/
enabled=1
gpgcheck=1
[nfs-ganesha-noarch]
name=nfs-ganesha-noarch
baseurl=https://download.nfs-ganesha.org/2.8/2.8.0/RHEL/el-8/noarch/
enabled=1
gpgcheck=1

And install:

yum -y install nfs-ganesha-gluster --nogpgcheck

After installation, we conduct the basic configuration in the file /etc/ganesha/ganesha.conf.

# create new
NFS_CORE_PARAM {
    # possible to mount with NFSv3 to NFSv4 Pseudo path
    mount_path_pseudo = true;
    # NFS protocol
    Protocols = 3,4;
}
EXPORT_DEFAULTS {
    # default access mode
    Access_Type = RW;
}
EXPORT {
    # uniq ID
    Export_Id = 101;
    # mount path of Gluster Volume
    Path = "/gluster/main";
    FSAL {
        # any name
        name = GLUSTER;
        # hostname or IP address of this Node
        hostname="gluster-01.example.com";
        # Gluster volume name
        volume="main";
    }
    # config for root Squash
    Squash="No_root_squash";
    # NFSv4 Pseudo path
    Pseudo="/main";
    # allowed security options
    SecType = "sys";
}
LOG {
    # default log level
    Default_Log_Level = WARN;
}

We need to start the service, enable nfs for our volume and check that it is turned on.

$ systemctl start nfs-ganesha
$ systemctl enable nfs-ganesha
$ gluster volume set main nfs.disable off
$ gluster volume status main

As a result, the status should indicate that the nfs server has started for our volume. You need to do mount and check.

mkdir /gluster-nfs
mount.nfs gluster-01.example.com:/main /gluster-nfs

But this option is not fault tolerant, so you need to make a VIP address that will travel between our two nodes and help switch traffic if one of the nodes falls.

Installing keepalived in CentOs is done immediately through the package manager.

$ yum install -y keepalived

We configure the service in the file /etc/keepalived/keepalived.conf:

global_defs {
    notification_email {
        admin@example.com
    }
    notification_email_from alarm@example.com
    smtp_server mail.example.com
    smtp_connect_timeout 30

    vrrp_garp_interval 10
    vrrp_garp_master_refresh 30
}

#C   ,   .    , VIP .
vrrp_script chk_gluster {
    script "pgrep glusterd"
    interval 2
}

vrrp_instance gluster {
    interface ens192
    state MASTER #     BACKUP
    priority 200 #      ,  100
    virtual_router_id 1
    virtual_ipaddress {
        10.10.6.70/24
    }

    unicast_peer {
        10.10.6.72 #        
    }

    track_script {
        chk_gluster
    }
}

Now we can start the service and check that the VIP on the node appears:

$ systemctl start keepalived
$ systemctl enable keepalived
$ ip addr
1: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:50:56:97:55:eb brd ff:ff:ff:ff:ff:ff
    inet 10.10.6.72/24 brd 10.10.6.255 scope global noprefixroute ens192
       valid_lft forever preferred_lft forever
    inet 10.10.6.70/24 scope global secondary ens192
       valid_lft forever preferred_lft forever

If everything worked for us, then it remains to add PersistentVolume to Kubernetes and create a test service to verify operation.

---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: gluster-nfs
spec:
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  nfs:
    server: 10.10.6.70
    path: /main

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
 name: gluster-nfs
spec:
 accessModes:
 - ReadWriteMany
 resources:
   requests:
     storage: 10Gi
 volumeName: "gluster-nfs"

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: gluster-test
  labels:
    app: gluster-test
spec:
  replicas: 1
  selector:
    matchLabels:
      app: gluster-test
  template:
    metadata:
      labels:
        app: gluster-test
    spec:
      volumes:
      - name: gluster
        persistentVolumeClaim:
          claimName: gluster-nfs
      containers:
      - name: gluster-test
        image: nginx
        volumeMounts:
        - name: gluster
          mountPath: /gluster

With this configuration, in case of a fall of the main node, it will be idle for about a minute until mount falls off in timeout and switches. Simple for a minute for this storage, let's say that this is not a regular situation and we will rarely meet with it, but in this case the system will automatically switch and continue working, and we will be able to solve the problem and carry out the recovery without worrying about the simple.

Summary

In this article, we examined 3 possible options for connecting GlusterFS to Kubernetes, in our version it is possible to add a provisioner to Kubernetes, but we do not need it yet. It remains to add the results of performance tests between NFS and Gluster on the same nodes.

Files on 1Mb:

sync; dd if=/dev/zero of=tempfile bs=1M count=1024; sync
Gluster: 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 2.63496 s, 407 MB/s
NFS: 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 5.4527 s, 197 MB/s

Files on 1Kb:

sync; dd if=/dev/zero of=tempfile bs=1K count=1048576; sync
Gluster: 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 70.0508 s, 15.3 MB/s
NFS: 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 6.95208 s, 154 MB/s

NFS works the same for any file size, the speed difference is not particularly noticeable, unlike GlusterFS, which is very degraded with small files. But at the same time, with large file sizes, NFS shows performance 2-3 times lower than Gluster.

GlusterFS as external storage for Kubernetes