Exam failed : blocking questions: etcd restore and cluster upgrade

thank you very much, it is very clear.
i hope to succeed in this task
please , a last question.
for the cluster upgrade is done on the master node or edge node ?

Cluster upgrade is on master node only. You need to ssh master node and perform upgrade.

All the best for your exam.

–data-dir is restored on the student node (base node), do we need to copy this directory on the master node as etcd service is running on the master node and not on the base node. Also, what about cluster token details?

Thank you Ramalrg. that is all for me

In exam you no need to copy --data-dir in master node. Restart the etcd server after restore in student node. I believe that is enough.

Thanks for the clarification.

Was just thinking: The etcd service/process is running on the master node right? So, how the process gets the restored data details? is NFS configured b/w student node & master node. Please clarify. Thanks.

I also think that on the exam it will be necessary to copy the directory from the base node to the master node or to launch the restoration on the master. so the directory is on the master.

at the same time I also doubt that the etcd is running on the base node.

Ramalrg , are you sure that the etcd is runind as a service on the base node ?

Hmmm. The question clearly says that etcd is running at 127.0.0.1:2379. Also systemctl status etcd on base node reports it is active and running. The same command on master mode reports no etcd is running. So it is evident that etcd is running on base node. I couldn’t get a chance to check NFS between master and base node due to time limitations in exam.

The steps provided in my previous post were tried on my 3 node cluster.

I know this is little bit confusing but this is the solution I have got it from various blogs, forums and after tried out in our local cluster.

ok thank you i understand. i agree as the question specifies , and that you have performed the tests

Best regards

@ramalrg During etcd restore in edge/worker node, after the restore command is executed, do we need to update the etcd.yaml static pod file to update the new data-dir values? If yes, where can we find this yaml file in the worker node. Or it is not needed at all as it is running as a service?

Step 1: check the etcd server status:

cloud@edge-node:~$ systemctl status etcd
● etcd.service - etcd
Loaded: loaded (/etc/systemd/system/etcd.service; enabled; vendor preset: Step 1: check the etcd server status:

cloud@edge-node:~$ systemctl status etcd
● etcd.service - etcd
Loaded: loaded (/etc/systemd/system/etcd.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2021-01-29 21:58:40 CST; 2 days ago
Docs: https://github.com/coreos
Main PID: 1299767 (etcd)
Tasks: 10 (limit: 4615)
Memory: 8.3M
CGroup: /system.slice/etcd.service
└─1299767 /usr/bin/etcd --name master-1 --cert-file=/etc/etcd/etcd.crt --key-file=/etc/etcd/etcd.key --peer-cert-file=/etc/etcd/etcd.cr>

STep 2: cat /etc/systemd/system/etcd.service

[Unit]
Description=etcd
Documentation=CoreOS · GitHub

[Service]
ExecStart=/usr/bin/etcd
–name master-1
–cert-file=/etc/etcd/etcd.crt
–key-file=/etc/etcd/etcd.key
–peer-cert-file=/etc/etcd/etcd.crt
–peer-key-file=/etc/etcd/etcd.key
–trusted-ca-file=/etc/etcd/ca.crt
–peer-trusted-ca-file=/etc/etcd/ca.crt
–peer-client-cert-auth
–client-cert-auth
–initial-advertise-peer-urls https://11.0.0.79:2380
–listen-peer-urls https://11.0.0.79:2380
–listen-client-urls https://11.0.0.79:2379,https://127.0.0.1:2379
–advertise-client-urls https://11.0.0.79:2379
–initial-cluster-token etcd-cluster-0
–initial-cluster master-1=https://11.0.0.79:2380
–initial-cluster-state new
data-dir=/var/lib/etcd
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target

Step 3: You will see data dir paths on edge node.

You can delete the etcd folder in data-dir before restoring is attempted. That way you no need to edit the yaml file.

Hope it clears

1 Like

Hey @ramalrg

Thank you so much for your detailed explanation. This helped me clear my doubt as well … I was able to upgrade the cluster but was facing issues in restoring etcd. As per the course, I restored the etcd backup in another dir but then I couldn’t find the manifest path and YAML file. Then I tried to look for the kubelet service so that I can find out the manifest folder path if it’s different on this student node. But to my surprise, there was no kubelet service running. I mean service kubelet status or systemctl status kubelet didn’t return any output therefore I was unable to find the config file to look for a manifest folder. Little did I know that the etcd was in fact running as a service.

But do you think it was possible that there was no kubelet service running on that node?

guy as anybody can explain the above question with sample Master Node or Student Node, in the mentioned steps and as well how to edit the Data dir in the Service file and so on, appreciate your prompt respond, as far i know the ETCD will be running on the Master Node so should we take the backup on Master and then restore it on Student/Edge Node, please more clarification to the steps will be highly appreciated

1 Like

@ramalrg ,

For example, if i restored the backup to /var/lib/etcd/backup ,
in this case, instead of editing the etcd config data-dir= ,
can I copy the restored file to /var/lib/etcd/ , does this works ?

<<For example if the new data-dir is /var/lib/etcd-from-back-up issue the below command to change the permissions
chown +R etcd.etcd /var/lib/etcd-from-back-up>>

is this “etcd.etcd” is username and group of etcdctl ? , how can we find it.

Thanks in advance …

This is million dollars answer. I did a very complicate solution in this task which accidently broke the etcd of ok8s cluster that is NOT used in this task.

When I started the task that using the ok8s cluster, I got “the connection to the server xx.xx.xx.xx 6443 was refused - did you specify the right host or port”. Therefore, I cannot do it, lost 7%.

And then I found another cluster mk8s totally down, ‘kubectl get node’ returns nothing. the cluster is shared with the ETCD task as well. I lost another 7%

I totally lost 21% because this failed task. I don’t think it is fair, because other two tasks shouldn’t be impacted.

this guy made a good point.

Hi,
Can you please elaborate the solution because I am not getting where exactly the change is required. Following error I got in exam when I tried to upgrade the master node as per kubernetes documentation(Upgrading kubeadm clusters | Kubernetes).

Error: Unable to open the lock /var/<sub_folder>/<sub_folder>/<file_name> : permission denied

// sorry, I have forget the complete path and file name in error. The file was owned by root user.

Curios to know what I am missing here. Thanks !

1 Like

Hi @Sreenivasulu,

Can you please give us more detail the command you run to have this error? However, permission denied means you don’t have the necessary permission on the file. You can add sudo on the the command to run it as root.

Regard

Hi, I have ran below command to upgrade. I haven’t tried the command with ‘sudo’, thought it’s not needed which I should have tried once at least. There were commands in the k8s documentation which are having ‘sudo’ and I used to follow the same in practice labs, didn’t come across such error. Will make a note of it, thanks for the reply.

apt-mark unhold kubeadm && \
 apt-get update && apt-get install -y kubeadm=1.26.0-00 && \
 apt-mark hold kubeadm

Hi @Sreenivasulu,

These commands need root access because you install an application on the system. Please try the following command :

sudo apt-mark unhold kubeadm && \
sudo apt-get update && sudo apt-get install -y kubeadm=1.26.0-00 && \
sudo apt-mark hold kubeadm

or

sudo -i

apt-mark unhold kubeadm && \
 apt-get update && apt-get install -y kubeadm=1.26.0-00 && \
 apt-mark hold kubeadm

exit