Etcd backup & Restore

kodekloud · August 27, 2022, 12:38pm

mohd aamir mir:
<#CHMV3P9NV|cka-certified-kub-admin> for the cka exam i did the backup for etcd successfully and i restored the backup too using the --data-dir=/var/lib/etcd-from-backup option but then i couldnt find the etcd file in /etcd/kubernetes/manisfests/ folder to change the path… can anyone tell wat was to be done here?

kodekloud · August 27, 2022, 12:38pm

Manickam Krishnan:
This same situation happened for me… And didn’t restore…
How you have taken the backup from exam node.

kodekloud · August 27, 2022, 12:38pm

Manickam Krishnan:
@Mumshad Mannambeth pls suggest

kodekloud · August 27, 2022, 12:38pm

Srikanth:
Same in my case as well, could not able to do this. I failed in my first attempt

kodekloud · August 27, 2022, 12:38pm

Srikanth:
@Mumshad Mannambeth @Alistair Mackay could you please share some insights here

kodekloud · August 27, 2022, 12:38pm

Srikanth:
Also one more thing, the etcd backup & restore task was asked to do from base node (not from master/controller node). So I was confused here. As in practice labs we did backup from master node itself. But in the exam when I check for etcd process details, I could see 3 etcd processes are running on base node, so not sure how to identify right config file to edit backup directory location & to stop the process as well.

This question really played with my confidence. somehow i failed in CKA exam as well

kodekloud · August 27, 2022, 12:38pm

Manickam Krishnan:
Yes same here… I lost 40min here

kodekloud · August 27, 2022, 12:39pm

Manickam Krishnan:
@unnivkn @Mumshad Mannambeth pls respond

kodekloud · August 27, 2022, 12:39pm

Alistair Mackay:
Hello @mohd aamir mir

In your post, you said

i couldnt find the etcd file in /etcd/kubernetes/manisfests/ folder

The manifest path is

/etc/kubernetes/manisfests/

and this applies when etcd is running as a pod within a cluster.

However, if you find etcd is not running as a pod in the cluster then you have to dig a bit further. The suggestion from @Srikanth was that there are multiple etcd processes on the base node so you will need to identify the correct one for the cluster indicated by the question.

Open the api server manifest on the correct control node for the cluster and find --etcd-servers argument. This will identify which host is running etcd and most importantly which port. The host should be the base node
Next, find the system unit files for the etcd servers running on the base node with sudo systemctl status . Identify the one with matching port (you’ll need to edit this file later). These files are most probably in /etc/systemd/system
Do the backup using the --endpoints <https://127.0.0.1>:port switch where port is the port number you have identified - so this will get the correct ectd process Use the cert files the question tells you.,
Restore the backup to a new directory.
Edit the system unit file you found in step 2 and adjust the --data-dir argument to where you restored the backup.
Restart the service

sudo systemctl daemon-reload
sudo systemctl restart XXX.service

where XXX.service is the filename of the unit file you edited.

Either way you are strongly advised to skip the backup/restore question and come back to it at the end if you have time. If you waste a long time on this question, you might miss out on easier questions further along.

kodekloud · August 27, 2022, 12:39pm

Manickam Krishnan:
@Alistair Mackay from exam node can we take snapshot and move the files to master node and restore?

kodekloud · August 27, 2022, 12:39pm

Manickam Krishnan:
Because question they clearly mention do task on exam node. Only troubleshooting go to master node and come back

kodekloud · August 27, 2022, 12:39pm

Alistair Mackay:
If the etcd process is running on the exam node, then its data directory is also on the exam node. No moving of files to other nodes would be required

kodekloud · August 27, 2022, 12:39pm

Alistair Mackay:
To verify whether or not etcd is running as a pod in the cluster indicated by the question, run the following after doing the use-context indicated by the question

kubnectl get pods -n kube-system

If there’s no etcd pod then it’s definitely running as a service elsewhere, e.g. the exam node.

kodekloud · August 27, 2022, 12:39pm

Manickam Krishnan:
Yes in my case it was not showing any pods on kubesystem and there is no manifest file on exam node

kodekloud · August 27, 2022, 12:39pm

Manickam Krishnan:
When excute the get nodes after switch the given context…
It was showing one master node and other one is worker node

kodekloud · August 27, 2022, 12:39pm

Manickam Krishnan:
I switched the maser node and try to execute the etcdctl snapshot save -h

kodekloud · August 27, 2022, 12:39pm

Manickam Krishnan:
It was saying command not found

kodekloud · August 27, 2022, 12:39pm

Manickam Krishnan:
Evern tried after exporting the ETCDCTL_API=3

kodekloud · August 27, 2022, 12:39pm

Manickam Krishnan:
But same command if I ecute in exam node it was allowing

kodekloud · August 27, 2022, 12:39pm

Alistair Mackay:
From what you describe, the system must have looked like this
• Exam node - more or more etcd processes running as operating system services, plus etcdctl command
• control node - no etcd service or pod running.
This means it’s a “remote etcd” setup. I have just updated the FAQ to cover what should be done in this case. It’s quite involved which is why the question should be left till last.

https://github.com/kodekloudhub/community-faq/blob/main/etcd-faq.md