Questions on ETCD

shashikaushal2006123 · December 26, 2024, 1:48pm

I do understand that ETCD is a distributed reliable key value store .In Kubernetes it stores information about nodes,POD’sConfigs.secreate,accounts,roles,bindings. Kubectl get command you see from the ETCD server .Every kubectl command information that you see is from ETCD server. Every deployments that you make is updated to the ETCD server.

1)When you specifically talk about backup of ETCD database how this is going to help in case of failures ?

2)When we talk about POD’s that are created on the worker nodes , ETCD will only have the information which POD is running on which node . Will it also have the information that this POD was created as part yaml file imperative way vs declarative way ?

3)I take the backup of the ETCD data base and their is failure . What failures are we expecting like the failure of the control plane node itself ? What will be the impact of the Pods running on the worker nodes if the control plane node goes down ?

I basically ran the command kubectl -n kube-system describe pods etcd-controlplane . Just would like to understand the below :

what these urls or endpoints listed are ?
-advertise-client-urls=https://192.168.28.36:2379
–cert-file=/etc/kubernetes/pki/etcd/server.crt
–client-cert-auth=true
–data-dir=/var/lib/etcd
–experimental-initial-corrupt-check=true
–experimental-watch-progress-notify-interval=5s
–initial-advertise-peer-urls=https://192.168.28.36:2380
–initial-cluster=controlplane=https://192.168.28.36:2380
–key-file=/etc/kubernetes/pki/etcd/server.key
–listen-client-urls=https://127.0.0.1:2379,https://192.168.28.36:2379
–listen-metrics-urls=http://127.0.0.1:2381
–listen-peer-urls=https://192.168.28.36:2380
–name=controlplane
–peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
–peer-client-cert-auth=true
–peer-key-file=/etc/kubernetes/pki/etcd/peer.key
–peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
–snapshot-count=10000

1)What is the importance of ETCD server certificate ?
–cert-file=/etc/kubernetes/pki/etcd/server.crt

2)What is the importance of client-urls ?
–listen-client-urls=https://127.0.0.1:2379,https://192.168.28.36:2379

3)What is the importance of ca cert file ?
–peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt

Alistair_KodeKloud · December 27, 2024, 7:59am

etcd contains the state of the cluster at a given point in time in terms of, as you rightly say, all the resources that are currently deployed.
It does not matter how the resource was created. It is the running state that is saved.
What you are protecting against is corruption of the etcd database itself. It may have got corrupted by a server crash with an incomplete write to the data store, in which case etcd process would be crash looping and the logs would likely indicate the reason.

All the etcd command line flags are documented here.

Certificates are required for authentication of clients (like API server, or etcdctl) with the server, i.e. it’s how you log into the server. The system used is called Mutual TLS (mTLS).

listen-client-urls are the addresses/ports that the server listens for connections on. Generally there are 2 - localhost (127.0.0.1) and the IP of the primary network interface of the node the service is running on.

shashikaushal2006123 · February 2, 2025, 4:12pm

@Alistair_KodeKloud …I have a question with respect to backup ETCD. What are we protecting here is corruption of the ETCD database itself . Now if their is corruption in the ETCD data base itself

1)what all things can we expect not to work ?
2)Will pods on the nodes still work ?
3)How can we come to know that the corruption is with the data base itself ?
4)Whether we would be able to create new PODs?

Now ETCD is a key value pair .My understanding is if their is corruption with the data base itself the pods on the nodes will still work , it is just that any kube-api server request commands are not going to work . Neither

Alistair_KodeKloud · February 2, 2025, 7:27pm

It depends!

IF some data has been accidentally deleted, e.g. by somebody messing directly with the database using etcdctl, then those resources will disappear, but the cluster will still operate.

If it’s binary corruption which renders the database unreadable by etcd, then etcd will crashloop and when that happens, apiserver will also crashloop resulting in a completely unusable cluster. Pods may still be running, but kubectl will not function at all.