Chiranga Alwis:
Hi folks,
I made an attempt to backup and restore etcd by executing the client commands from a worker node.
As we know the backup step is not difficult as we can point the target etcd endpoint (master node) via the --endpoint arg.
But in the case of restoration I noticed that, we are required to login to the master node as the etcdctl snapshot restore command’s --data-dir arg uses the local file system (even though we an external host endpoint) to create the data directory instead of the endpoint provided. Given that we are provided the required certificates within the worker node (or any external host to etcd’s host), does that mean that it is mandatory to login to the etcd’s host during a restoration?
Chiranga Alwis:
@Mumshad Mannambeth @unnivkn appreciate your feedback on this.
Aneek Bera:
This is how it is done.
Taking backup:
ETCDCTL_API=3 etcdctl --cacert="/etc/kubernetes/pki/etcd/ca.crt" --cert="/etc/kubernetes/pki/etcd/server.crt" --key="/etc/kubernetes/pki/etcd/server.key" snapshot save /opt/snapshot-pre-boot.db
Restoring from backup:
ETCDCTL_API=3 etcdctl snapshot restore /opt/snapshot-pre-boot.db --data-dir=/var/lib/etcd-from-backup
root@controlplane:~# ls /var/lib/etcd
member
root@controlplane:~# ls /var/lib/etcd-from-backup
member
root@controlplane:~#
Note: 'etcd-from-backup' is the place from where the snapshot has been restored to.
But, now, etcd cluster is still configured to connect to /var/lib/etcd. So, we need to change this to new directory in etch yaml file.
Here are the steps:
root@controlplane:~# cd /etc/kubernetes/manifests/
root@controlplane:/etc/kubernetes/manifests# ll
total 28
drwxr-xr-x 1 root root 4096 Sep 20 21:22 ./
drwxr-xr-x 1 root root 4096 Sep 20 20:33 ../
-rw------- 1 root root 2183 Sep 20 20:33 etcd.yaml
-rw------- 1 root root 3807 Sep 20 20:33 kube-apiserver.yaml
-rw------- 1 root root 3314 Sep 20 20:33 kube-controller-manager.yaml
-rw------- 1 root root 1384 Sep 20 20:33 kube-scheduler.yaml
root@controlplane:/etc/kubernetes/manifests# vim etcd.yaml
Come to the bottom of the yaml file and change the host path:
From:
- hostPath:
path: /var/lib/etcd
type: DirectoryOrCreate
name: etcd-data
To:
- hostPath:
path: /var/lib/etcd-from-backup
type: DirectoryOrCreate
name: etcd-data
Chiranga Alwis:
@Aneek Bera thanks for your answer first of all.
Please correct me if I am wrong but this is if we perform all the steps while being in the same host or most precisely the host of the etcd service or static Pod (very often a master node).
My question is how to perform the restoration while being in an external host?
Say you have the backup snapshot and the etcd certs in a worker Node (where the etcd Linux service or static Pod isn’t running). How can we restore the etcd running in a master node to its previous state (stored within the backup snapshot).
AFAIU, this isn’t possible because during the restoration you provide the data directory where the backup is restored to and this is has to be within the etcd’s host. Running the restore command from an external host creates the data directory within that external host (even if we set the endpoint to point to the controlplane Node) and not within the controlplane Node.
Does that mean that during restoration we always need to be in the same host where the etcd service or static Pod is running?
Chiranga Alwis:
Please correct me if I have missed anything @Aneek Bera @Mumshad Mannambeth @unnivkn
Chiranga Alwis:
Folks, any update on this?
My question simply is whether and how we can perform an etcd restoration while being in an external host?
Eyal Solomon:
@unnivkn @Mumshad Mannambeth @Tej_Singh_Rana
Mumshad Mannambeth:
@Chiranga Alwis Good question. The backup and restore procedure for ETCD is like any backup and restore procedure. So we need to understand the following:
• Where is the data stored? - This is dependent on the ETCD implementation. In the above case ETCD is running as a pod and the etcd-data is stored at /var/lib/etcd within the pod, which is in fact mapped to /var/lib/etcd on the controlplane host as a hostPath. Note that this need not be the case always. This just happens to be how it is in this a particular case. It could be on a PV on an S3 bucket on Cloud or elsewhere.
• How do you take a backup of it? - However you can create a consistent copy of the /var/lib/etcd . Either shutdown etcd instances and manually make a copy of it. But etcd provides a command just for that. So we use that - etcd snapshot save . This takes a backup of the data within /var/lib/etcd to a file /opt/snapshot-pre-boot.db in the above example.
• Where do you restore it to? - We either restore it to the same location - /var/lib/etcd by first shutting down etcd and restoring data to the same directory. Or we restore it to another location /var/lib/etcd-from-backup and then configure that as the data directory for etcd.
So in this case since etcd is running on the control plane with the data directory as hostpath on the controplane, yes you need to be on the control plane. But say if data directory was configured on a clustered storage - s3, ceph or something else, then you could perform this operation from any system that has access to that storage. Conclusion: restore etcd from a system that has access to the storage that stores etcd data.
Hope that’s clear.
Chiranga Alwis:
@Mumshad Mannambeth got it thanks for the perfect answer as always.