Finding the optimal storage is a rather complicated process, everything has its pros and cons. Of course, the leader in this category is CEPH, but it is a rather complex system, albeit with very rich functionality. For us, such a system is redundant, given that we needed a simple replicated storage in master-master mode for a couple of terabytes. Having studied a lot of material, it was decided to test the most fashionable product on the market for the circuit we are interested in. Due to the fact that no ready-made solution of such a plan was found, I would like to share my best practices on this topic and describe the problems that we encountered in the deployment process.Goals
What did we expect from the new repository:- Ability to work with an even number of nodes for replication.
- Easy installation, setup, support
- The system must be adult, time-tested and users
- Ability to expand storage space without downtime
- Storage must be compatible with Kubernetes
- There should be an automatic failover when one of the nodes crashes
It is on the last point that we have a lot of questions.Deployment
For deployment, two virtual machines were created on CentOs 8. Each of them is connected via an additional disk with storage.Preliminary preparation
For GlusterFS, you need to allocate a separate disk with XFS so that it does not affect the system in any way.Select the partition:$ fdisk /dev/sdb
Command (m for help): n
Partition type
p primary (0 primary, 0 extended, 4 free)
e extended (container for logical partitions)
Select (default p): p
Partition number (1-4, default 1): 1
First sector (2048-16777215, default 2048):
Last sector, +sectors or +size{K,M,G,T,P} (2048-16777215, default 16777215):
Created a new partition 1 of type โLinuxโ and of size 8 GiB.
Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table. Syncing disks.
Format in XFS and mount:$ mkfs.xfs /dev/sdb1
$ mkdir /gluster
$ mount /dev/sdb1 /gluster
And to top it off, drop the entry in / etc / fstab to automatically mount the directory at system startup:/dev/sdb1 /gluster xfs defaults 0 0
Installation
Concerning the installation, many articles have been written, in this connection we will not go deep into the process, we will just consider what it is worth paying attention to.On both nodes, install and run the latest version glusterfs:$ wget -P /etc/yum.repos.d https://download.gluster.org/pub/gluster/glusterfs/LATEST/CentOS/glusterfs-rhel8.repo
$ yum -y install yum-utils
$ yum-config-manager --enable PowerTools
$ yum install -y glusterfs-server
$ systemctl start glusterd
Next, you need to tell the glaster where his neighboring node is. It is done with only one node. An important point: if you have a domain network, then you must specify the server name with the domain, otherwise in the future you will have to redo everything.$ gluster peer probe gluster-02.example.com
If it was successful, then we check the connection with the command from both servers:$ gluster peer status
Number of Peers: 1
Hostname: gluster-02.example.com
Uuid: a6de3b23-ee31-4394-8bff-0bd97bd54f46
State: Peer in Cluster (Connected)
Other names:
10.10.6.72
Now you can create a Volume in which we will write.gluster volume create main replica 2 gluster-01.example.com:/gluster/main gluster-02.example.com:/gluster/main force
Where:- main - name Volume
- replica - type Volume (more details can be found in the official documentation )
- 2 - number of replicas
Run Volume and check its performance:gluster volume start main
gluster volume status main
For replicated Volume, it is recommended that you set the following parameters:$ gluster volume set main network.ping-timeout 5
$ gluster volume set main cluster.quorum-type fixed
$ gluster volume set main cluster.quorum-count 1
$ gluster volume set main performance.quick-read on
With these simple steps, we have built a GlusterFS cluster. It remains to connect to it and check the performance. Ubuntu is installed on the client machine, for mounting you need to install the client:$ add-apt-repository ppa:gluster/glusterfs-7
$ apt install glusterfs-client
$ mkdir /gluster
$ mount.glusterfs gluster-01.example.com:/main /gluster
Gluster, when connected to one of the nodes, gives the addresses of all the nodes and automatically connects to all. If the client has already connected, the failure of one of the nodes will not lead to a halt. But if the first node is unavailable, it will not work to connect in the event of a session break. To do this, when mounting, you can pass the backupvolfile parameter indicating the second node.mount.glusterfs gluster-01.example.com:/main /gluster -o backupvolfile-server=gluster-02.example.com
An important point: gluster synchronizes files between nodes only if their change was through the mounted volume. If you make changes directly on the nodes, the file will be out of sync.Connect to Kubernetes
At this stage, the questions began: โHow to connect it?โ. And there are several options. Consider them.Heketi
The most popular and recommended is to use an external service: heketi. heketi is a layer between kubernetes and gluster that allows you to manage and work with the repository via http. But heketi will be that single point of failure, because service is not clustered. The second instance of this service will not be able to work independently, because any changes are stored in the local database. Running this service in kubernetes is also not suitable, because he needs a static disk on which his database will be stored. In this regard, this option turned out to be the most inappropriate.Endpoint at Kubernetes
If you have Kubernetes on systems with package managers, then this is a very convenient option. The point is that for all GlusteFS servers in Kubernetes, a common Endpoint is created. A service is hung on this Endpoint and we will already be mounted on this service. For this option to work, it is necessary to install glusterfs-client on each Kubernetes node and make sure that it can be mounted. In Kubernetes, deploy the following config:apiVersion: v1
kind: Endpoints
metadata:
name: glusterfs-cluster
subsets:
- addresses:
- ip: 10.10.6.71
ports:
- port: 1
- addresses:
- ip: 10.10.6.72
ports:
- port: 1
---
apiVersion: v1
kind: Service
metadata:
name: glusterfs-cluster
spec:
ports:
- port: 1
Now we can create a simple test deployment and check how the mounting works. Below is an example of a simple test deployment:apiVersion: apps/v1
kind: Deployment
metadata:
name: gluster-test
spec:
replicas: 1
selector:
matchLabels:
app: gluster-test
template:
metadata:
labels:
app: gluster-test
spec:
volumes:
- name: gluster
glusterfs:
endpoints: glusterfs-cluster
path: main
containers:
- name: gluster-test
image: nginx
volumeMounts:
- name: gluster
mountPath: /gluster
This option did not suit us, because we have container-linux on all Kubernetes nodes. The package manager is not there, so it was not possible to install gluster-client for mounting. In this regard, the third option was found, which it was decided to use.GlusterFS + NFS + keepalived
Until recently, GlusterFS offered its own NFS server, but now NFS uses the external nfs-ganesha service. Quite a bit has been written about this, in connection with this we will figure out how to configure it.The repository must be registered manually. To do this, in the file /etc/yum.repos.d/nfs-ganesha.repo we add:[nfs-ganesha]
name=nfs-ganesha
baseurl=https://download.nfs-ganesha.org/2.8/2.8.0/RHEL/el-8/$basearch/
enabled=1
gpgcheck=1
[nfs-ganesha-noarch]
name=nfs-ganesha-noarch
baseurl=https://download.nfs-ganesha.org/2.8/2.8.0/RHEL/el-8/noarch/
enabled=1
gpgcheck=1
And install:yum -y install nfs-ganesha-gluster --nogpgcheck
After installation, we conduct the basic configuration in the file /etc/ganesha/ganesha.conf.# create new
NFS_CORE_PARAM {
# possible to mount with NFSv3 to NFSv4 Pseudo path
mount_path_pseudo = true;
# NFS protocol
Protocols = 3,4;
}
EXPORT_DEFAULTS {
# default access mode
Access_Type = RW;
}
EXPORT {
# uniq ID
Export_Id = 101;
# mount path of Gluster Volume
Path = "/gluster/main";
FSAL {
# any name
name = GLUSTER;
# hostname or IP address of this Node
hostname="gluster-01.example.com";
# Gluster volume name
volume="main";
}
# config for root Squash
Squash="No_root_squash";
# NFSv4 Pseudo path
Pseudo="/main";
# allowed security options
SecType = "sys";
}
LOG {
# default log level
Default_Log_Level = WARN;
}
We need to start the service, enable nfs for our volume and check that it is turned on.$ systemctl start nfs-ganesha
$ systemctl enable nfs-ganesha
$ gluster volume set main nfs.disable off
$ gluster volume status main
As a result, the status should indicate that the nfs server has started for our volume. You need to do mount and check.mkdir /gluster-nfs
mount.nfs gluster-01.example.com:/main /gluster-nfs
But this option is not fault tolerant, so you need to make a VIP address that will travel between our two nodes and help switch traffic if one of the nodes falls.Installing keepalived in CentOs is done immediately through the package manager.$ yum install -y keepalived
We configure the service in the file /etc/keepalived/keepalived.conf:global_defs {
notification_email {
admin@example.com
}
notification_email_from alarm@example.com
smtp_server mail.example.com
smtp_connect_timeout 30
vrrp_garp_interval 10
vrrp_garp_master_refresh 30
}
#C , . , VIP .
vrrp_script chk_gluster {
script "pgrep glusterd"
interval 2
}
vrrp_instance gluster {
interface ens192
state MASTER # BACKUP
priority 200 # , 100
virtual_router_id 1
virtual_ipaddress {
10.10.6.70/24
}
unicast_peer {
10.10.6.72 #
}
track_script {
chk_gluster
}
}
Now we can start the service and check that the VIP on the node appears:$ systemctl start keepalived
$ systemctl enable keepalived
$ ip addr
1: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:50:56:97:55:eb brd ff:ff:ff:ff:ff:ff
inet 10.10.6.72/24 brd 10.10.6.255 scope global noprefixroute ens192
valid_lft forever preferred_lft forever
inet 10.10.6.70/24 scope global secondary ens192
valid_lft forever preferred_lft forever
If everything worked for us, then it remains to add PersistentVolume to Kubernetes and create a test service to verify operation.---
apiVersion: v1
kind: PersistentVolume
metadata:
name: gluster-nfs
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
nfs:
server: 10.10.6.70
path: /main
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: gluster-nfs
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 10Gi
volumeName: "gluster-nfs"
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: gluster-test
labels:
app: gluster-test
spec:
replicas: 1
selector:
matchLabels:
app: gluster-test
template:
metadata:
labels:
app: gluster-test
spec:
volumes:
- name: gluster
persistentVolumeClaim:
claimName: gluster-nfs
containers:
- name: gluster-test
image: nginx
volumeMounts:
- name: gluster
mountPath: /gluster
With this configuration, in case of a fall of the main node, it will be idle for about a minute until mount falls off in timeout and switches. Simple for a minute for this storage, let's say that this is not a regular situation and we will rarely meet with it, but in this case the system will automatically switch and continue working, and we will be able to solve the problem and carry out the recovery without worrying about the simple.Summary
In this article, we examined 3 possible options for connecting GlusterFS to Kubernetes, in our version it is possible to add a provisioner to Kubernetes, but we do not need it yet. It remains to add the results of performance tests between NFS and Gluster on the same nodes.Files on 1Mb:sync; dd if=/dev/zero of=tempfile bs=1M count=1024; sync
Gluster: 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 2.63496 s, 407 MB/s
NFS: 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 5.4527 s, 197 MB/s
Files on 1Kb:sync; dd if=/dev/zero of=tempfile bs=1K count=1048576; sync
Gluster: 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 70.0508 s, 15.3 MB/s
NFS: 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 6.95208 s, 154 MB/s
NFS works the same for any file size, the speed difference is not particularly noticeable, unlike GlusterFS, which is very degraded with small files. But at the same time, with large file sizes, NFS shows performance 2-3 times lower than Gluster.