👩🏿‍🤝‍👨🏻 😭 🧝🏿 Our experience working with data in etcd Kubernetes-cluster directly (without K8s API) 😅 👸🏻 🤾

More and more often, customers turn to us with a request to provide access to the Kubernetes cluster for the possibility of accessing services within the cluster: in order to be able to connect directly to some database or service, to connect the local application with applications within the cluster ...

For example, there is a need to connect from your local machine to the service memcached.staging.svc.cluster.local. We provide this opportunity with a VPN inside the cluster to which the client connects. To do this, we are announcing the subnets of pods, services, and push cluster DNS to the client. Thus, when the client tries to connect to the service memcached.staging.svc.cluster.local, the request goes to the cluster’s DNS and in response receives the address of this service from the cluster service network or the pod address.

We configure K8s clusters using kubeadm, where the default service subnet is 192.168.0.0/16, and the pod network is 10.244.0.0/16. Usually everything works well, but there are a couple of points:

The subnet is 192.168.*.*often used in office networks of clients, and even more often in home networks of developers. And then we get conflicts: home routers work on this subnet and the VPN push these subnets from the cluster to the client.
We have several clusters (production, stage, and / or several dev clusters). Then in all of them by default there will be the same subnets for pods and services, which creates great difficulties for working with services in several clusters simultaneously.

For quite some time now we have adopted the practice of using different subnets for services and pods within the framework of one project - in general, so that all clusters are with different networks. However, there are a large number of clusters in operation that I would not want to roll from scratch, since they run many services, stateful applications, etc.

And then we asked ourselves: how would I change the subnet in an existing cluster?

Searching of decisions

The most common practice is to recreate all services with the ClusterIP type. Alternatively, they can also advise this:

The following process has a problem: after everything configured, the pods come up with the old IP as a DNS nameserver in /etc/resolv.conf.
Since I still did not find the solution, i had to reset the entire cluster with kubeadm reset and init it again.

But this does not suit everyone ... Here are more detailed introductory notes for our case:

Used by Flannel;
There are clusters both in the clouds and on the iron;
I would like to avoid the repeated deployment of all services in the cluster;
There is a need to do everything with a minimum of problems;
Kubernetes version is 1.16.6 (however, further actions will be similar for other versions);
The main task is to 192.168.0.0/16replace it with a cluster deployed using kubeadm with a service subnet 172.24.0.0/16.

And it just so happened that for a long time it was interesting for us to see what and how in Kubernetes it is stored in etcd, what can be done with it at all ... So we thought: “ Why not just update the data in etcd by replacing the old IP addresses (subnet) to new ones ? ”

Looking for ready-made tools for working with data in etcd, we did not find anything that completely solves the task. (By the way, if you know about any utilities for working with data directly in etcd, we will be grateful for the links.) However, OpenShift etcdhelper became a good starting point (thanks to its authors!) .

This utility is able to connect to etcd using certificates and read out the data using the commands ls, get, dump.

Add etcdhelper

The following thought is logical: “What prevents to add this utility, adding the ability to write data to etcd?”

It has been translated into a modified version of etcdhelper with two new changeServiceCIDRand features changePodCIDR. You can see her code here .

What do the new features do? Algorithm changeServiceCIDR:

create a deserializer;
compile a regular expression to replace CIDR;
we pass through all services with the ClusterIP type in the cluster:
- decode the value from etcd to the Go object;
- using a regular expression, replace the first two bytes of the address;
- assign the service an IP address from a new subnet;
- create a serializer, convert the Go object to protobuf, write new data to etcd.

The function changePodCIDRis essentially the same changeServiceCIDR- only instead of editing the service specification we do it for the node and change it .spec.PodCIDRto a new subnet.

Practice

Change serviceCIDR

The plan for the implementation of the task is very simple, but involves downtime at the time of re-creation of all pods in the cluster. After describing the main steps, we will also share our thoughts on how, in theory, this simple one can be minimized.

Preparatory actions:

installing the necessary software and assembling the patched etcdhelper;
backup etcd and /etc/kubernetes.

Short action plan for changing serviceCIDR:

Modifying apiserver and controller-manager manifests
reissue of certificates;
ClusterIP services change in etcd;
restart all pods in a cluster.

The following is a complete sequence of actions in detail.

1. Install etcd-client for data dump:

apt install etcd-client

2. We collect etcdhelper:

We put golang:

GOPATH=/root/golang
mkdir -p $GOPATH/local
curl -sSL https://dl.google.com/go/go1.14.1.linux-amd64.tar.gz | tar -xzvC $GOPATH/local
echo "export GOPATH=\"$GOPATH\"" >> ~/.bashrc
echo 'export GOROOT="$GOPATH/local/go"' >> ~/.bashrc
echo 'export PATH="$PATH:$GOPATH/local/go/bin"' >> ~/.bashrc

We save ourselves etcdhelper.go, load the dependencies, collect:

wget https://raw.githubusercontent.com/flant/examples/master/2020/04-etcdhelper/etcdhelper.go
go get go.etcd.io/etcd/clientv3 k8s.io/kubectl/pkg/scheme k8s.io/apimachinery/pkg/runtime
go build -o etcdhelper etcdhelper.go

3. Make backup etcd:

backup_dir=/root/backup
mkdir ${backup_dir}
cp -rL /etc/kubernetes ${backup_dir}
ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt --key=/etc/kubernetes/pki/etcd/server.key --cert=/etc/kubernetes/pki/etcd/server.crt --endpoints https://192.168.199.100:2379 snapshot save ${backup_dir}/etcd.snapshot

4. Change the service subnet in the Kubernetes control plane manifests. In files /etc/kubernetes/manifests/kube-apiserver.yamland /etc/kubernetes/manifests/kube-controller-manager.yamlchange the parameter --service-cluster-ip-rangeto a new subnet: 172.24.0.0/16instead 192.168.0.0/16.

5. Since we are changing the service subnet to which kubeadm issues certificates for apiserver (including), they must be reissued:

Let's look at which domains and IP addresses the current certificate is issued:

openssl x509 -noout -ext subjectAltName </etc/kubernetes/pki/apiserver.crt
X509v3 Subject Alternative Name:
    DNS:dev-1-master, DNS:kubernetes, DNS:kubernetes.default, DNS:kubernetes.default.svc, DNS:kubernetes.default.svc.cluster.local, DNS:apiserver, IP Address:192.168.0.1, IP Address:10.0.0.163, IP Address:192.168.199.100

Prepare the minimum config for kubeadm:

cat kubeadm-config.yaml
apiVersion: kubeadm.k8s.io/v1beta1
kind: ClusterConfiguration
networking:
  podSubnet: "10.244.0.0/16"
  serviceSubnet: "172.24.0.0/16"
apiServer:
  certSANs:
  - "192.168.199.100" # IP-

Let's delete the old crt and key, because without this a new certificate will not be issued:
```
rm /etc/kubernetes/pki/apiserver.{key,crt}
```

Re-issue certificates for the API server:

kubeadm init phase certs apiserver --config=kubeadm-config.yaml

, :

openssl x509 -noout -ext subjectAltName </etc/kubernetes/pki/apiserver.crt
X509v3 Subject Alternative Name:
    DNS:kube-2-master, DNS:kubernetes, DNS:kubernetes.default, DNS:kubernetes.default.svc, DNS:kubernetes.default.svc.cluster.local, IP Address:172.24.0.1, IP Address:10.0.0.163, IP Address:192.168.199.100

API- :

docker ps | grep k8s_kube-apiserver | awk '{print $1}' | xargs docker restart

admin.conf:
```
kubeadm alpha certs renew admin.conf
```

etcd:

./etcdhelper -cacert /etc/kubernetes/pki/etcd/ca.crt -cert /etc/kubernetes/pki/etcd/server.crt -key /etc/kubernetes/pki/etcd/server.key -endpoint https://127.0.0.1:2379 change-service-cidr 172.24.0.0/16

! , pod' /etc/resolv.conf CoreDNS (kube-dns), kube-proxy iptables . .

ConfigMap' kube-system:
```
kubectl -n kube-system edit cm kubelet-config-1.16
```
— clusterDNS IP- kube-dns: kubectl -n kube-system get svc kube-dns.
```
kubectl -n kube-system edit cm kubeadm-config
```
— data.ClusterConfiguration.networking.serviceSubnet .

kube-dns, kubelet :

kubeadm upgrade node phase kubelet-config && systemctl restart kubelet

It remains to restart all the pods in the cluster:

kubectl get pods --no-headers=true --all-namespaces |sed -r 's/(\S+)\s+(\S+).*/kubectl --namespace \1 delete pod \2/e'

Minimized Downtime

Thoughts on how to minimize downtime:

After changing the manifestations of the control plane, create a new kube-dns service, for example, with a name kube-dns-tmpand a new address 172.24.0.10.
Make ifin etcdhelper, which will not modify the kube-dns service.
Replace the address in all kubelets ClusterDNSwith a new one, while the old service will continue to work simultaneously with the new one.
Wait for pods with applications to roll either by themselves for natural reasons, or at an agreed time.
Delete the service kube-dns-tmpand change it serviceSubnetCIDRfor the kube-dns service.

This plan will minimize downtime up to ~ a minute - for the time the service is removed kube-dns-tmpand the subnet for the service is replaced kube-dns.

Modification podNetwork

At the same time, we decided to see how to modify podNetwork using the resulting etcdhelper. The sequence of actions is as follows:

we fix configs in kube-system;
we fix the manifest of kube-controller-manager;
we change podCIDR directly in etcd;
reboot all nodes of the cluster.

Now more about these actions:

1. Modify ConfigMap in the namespace kube-system:

kubectl -n kube-system edit cm kubeadm-config

- fixed data.ClusterConfiguration.networking.podSubneton a new subnet 10.55.0.0/16.

kubectl -n kube-system edit cm kube-proxy

- we correct data.config.conf.clusterCIDR: 10.55.0.0/16.

2. Modify the controller-manager’s manifest:

vim /etc/kubernetes/manifests/kube-controller-manager.yaml

- we correct --cluster-cidr=10.55.0.0/16.

3. Look at the current values .spec.podCIDR, .spec.podCIDRs, .InternalIP, .status.addressesfor all nodes in the cluster:

kubectl get no -o json | jq '[.items[] | {"name": .metadata.name, "podCIDR": .spec.podCIDR, "podCIDRs": .spec.podCIDRs, "InternalIP": (.status.addresses[] | select(.type == "InternalIP") | .address)}]'

[
  {
    "name": "kube-2-master",
    "podCIDR": "10.244.0.0/24",
    "podCIDRs": [
      "10.244.0.0/24"
    ],
    "InternalIP": "192.168.199.2"
  },
  {
    "name": "kube-2-master",
    "podCIDR": "10.244.0.0/24",
    "podCIDRs": [
      "10.244.0.0/24"
    ],
    "InternalIP": "10.0.1.239"
  },
  {
    "name": "kube-2-worker-01f438cf-579f9fd987-5l657",
    "podCIDR": "10.244.1.0/24",
    "podCIDRs": [
      "10.244.1.0/24"
    ],
    "InternalIP": "192.168.199.222"
  },
  {
    "name": "kube-2-worker-01f438cf-579f9fd987-5l657",
    "podCIDR": "10.244.1.0/24",
    "podCIDRs": [
      "10.244.1.0/24"
    ],
    "InternalIP": "10.0.4.73"
  }
]

4. Replace podCIDR by making changes directly to etcd:

./etcdhelper -cacert /etc/kubernetes/pki/etcd/ca.crt -cert /etc/kubernetes/pki/etcd/server.crt -key /etc/kubernetes/pki/etcd/server.key -endpoint https://127.0.0.1:2379 change-pod-cidr 10.55.0.0/16

5. Check that podCIDR has really changed:

kubectl get no -o json | jq '[.items[] | {"name": .metadata.name, "podCIDR": .spec.podCIDR, "podCIDRs": .spec.podCIDRs, "InternalIP": (.status.addresses[] | select(.type == "InternalIP") | .address)}]'

[
  {
    "name": "kube-2-master",
    "podCIDR": "10.55.0.0/24",
    "podCIDRs": [
      "10.55.0.0/24"
    ],
    "InternalIP": "192.168.199.2"
  },
  {
    "name": "kube-2-master",
    "podCIDR": "10.55.0.0/24",
    "podCIDRs": [
      "10.55.0.0/24"
    ],
    "InternalIP": "10.0.1.239"
  },
  {
    "name": "kube-2-worker-01f438cf-579f9fd987-5l657",
    "podCIDR": "10.55.1.0/24",
    "podCIDRs": [
      "10.55.1.0/24"
    ],
    "InternalIP": "192.168.199.222"
  },
  {
    "name": "kube-2-worker-01f438cf-579f9fd987-5l657",
    "podCIDR": "10.55.1.0/24",
    "podCIDRs": [
      "10.55.1.0/24"
    ],
    "InternalIP": "10.0.4.73"
  }
]

6. In turn, we will reboot all nodes of the cluster.

7. If at least one node leaves the old podCIDR , then kube-controller-manager will not be able to start, and pods in the cluster will not be planned.

In fact, changing podCIDR can be made easier (for example, like this ). But we wanted to learn how to work with etcd directly, because there are cases when editing Kubernetes objects in etcd is the only possible option. (For example, you can’t just change the field of Service without downtime spec.clusterIP.)

Total

The article considers the possibility of working with data in etcd directly, i.e. bypassing the Kubernetes API. Sometimes this approach allows you to do "tricky things." The operations described in the text were tested on real K8s clusters. However, their status of readiness for widespread use is PoC (proof of concept) . Therefore, if you want to use a modified version of etcdhelper utility on your clusters, do it at your own peril and risk.

Our experience working with data in etcd Kubernetes-cluster directly (without K8s API)