Etcd backup & Restore

mohd aamir mir:
<#CHMV3P9NV|cka-certified-kub-admin> for the cka exam i did the backup for etcd successfully and i restored the backup too using the --data-dir=/var/lib/etcd-from-backup option but then i couldnt find the etcd file in /etcd/kubernetes/manisfests/ folder to change the path… can anyone tell wat was to be done here?

Manickam Krishnan:
This same situation happened for me… And didn’t restore…
How you have taken the backup from exam node.

Manickam Krishnan:
@Mumshad Mannambeth pls suggest

Srikanth:
Same in my case as well, could not able to do this. I failed in my first attempt

Srikanth:
@Mumshad Mannambeth @Alistair Mackay could you please share some insights here

Srikanth:
Also one more thing, the etcd backup & restore task was asked to do from base node (not from master/controller node). So I was confused here. As in practice labs we did backup from master node itself. But in the exam when I check for etcd process details, I could see 3 etcd processes are running on base node, so not sure how to identify right config file to edit backup directory location & to stop the process as well.

This question really played with my confidence. somehow i failed in CKA exam as well

Manickam Krishnan:
Yes same here… I lost 40min here

Manickam Krishnan:
@unnivkn @Mumshad Mannambeth pls respond

Alistair Mackay:
Hello @mohd aamir mir

In your post, you said

i couldnt find the etcd file in /etcd/kubernetes/manisfests/ folder

The manifest path is

/etc/kubernetes/manisfests/

and this applies when etcd is running as a pod within a cluster.

However, if you find etcd is not running as a pod in the cluster then you have to dig a bit further. The suggestion from @Srikanth was that there are multiple etcd processes on the base node so you will need to identify the correct one for the cluster indicated by the question.

  1. Open the api server manifest on the correct control node for the cluster and find --etcd-servers argument. This will identify which host is running etcd and most importantly which port. The host should be the base node
  2. Next, find the system unit files for the etcd servers running on the base node with sudo systemctl status . Identify the one with matching port (you’ll need to edit this file later). These files are most probably in /etc/systemd/system
  3. Do the backup using the --endpoints <https://127.0.0.1>:port switch where port is the port number you have identified - so this will get the correct ectd process Use the cert files the question tells you.,
  4. Restore the backup to a new directory.
  5. Edit the system unit file you found in step 2 and adjust the --data-dir argument to where you restored the backup.
  6. Restart the service
sudo systemctl daemon-reload
sudo systemctl restart XXX.service

where XXX.service is the filename of the unit file you edited.

Either way you are strongly advised to skip the backup/restore question and come back to it at the end if you have time. If you waste a long time on this question, you might miss out on easier questions further along.

Manickam Krishnan:
@Alistair Mackay from exam node can we take snapshot and move the files to master node and restore?

Manickam Krishnan:
Because question they clearly mention do task on exam node. Only troubleshooting go to master node and come back

Alistair Mackay:
If the etcd process is running on the exam node, then its data directory is also on the exam node. No moving of files to other nodes would be required

Alistair Mackay:
To verify whether or not etcd is running as a pod in the cluster indicated by the question, run the following after doing the use-context indicated by the question

kubnectl get pods -n kube-system

If there’s no etcd pod then it’s definitely running as a service elsewhere, e.g. the exam node.

Manickam Krishnan:
Yes in my case it was not showing any pods on kubesystem and there is no manifest file on exam node

Manickam Krishnan:
When excute the get nodes after switch the given context…
It was showing one master node and other one is worker node

Manickam Krishnan:
I switched the maser node and try to execute the etcdctl snapshot save -h

Manickam Krishnan:
It was saying command not found

Manickam Krishnan:
Evern tried after exporting the ETCDCTL_API=3

Manickam Krishnan:
But same command if I ecute in exam node it was allowing

Alistair Mackay:
From what you describe, the system must have looked like this
• Exam node - more or more etcd processes running as operating system services, plus etcdctl command
• control node - no etcd service or pod running.
This means it’s a “remote etcd” setup. I have just updated the FAQ to cover what should be done in this case. It’s quite involved which is why the question should be left till last.

https://github.com/kodekloudhub/community-faq/blob/main/etcd-faq.md