I’ve backed up my etcd and after restoring it, i can’t Create/Update/Delete anything in my cluster!
Here are my steps:
Backing up etcd
- Save the snapshop
sudo ETCDCTL_API=3 etcdctl snapshot save /tmp/etcd-backup-new.db \
--cacert /etc/kubernetes/pki/etcd/ca.crt \
--cert /etc/kubernetes/pki/etcd/server.crt \
--key /etc/kubernetes/pki/etcd/server.key
- Check the status
$ sudo ETCDCTL_API=3 etcdctl snapshot status /tmp/etcd-backup-new.db --write-out=table
+----------+----------+------------+------------+
| HASH | REVISION | TOTAL KEYS | TOTAL SIZE |
+----------+----------+------------+------------+
| d8d0da24 | 7220348 | 874 | 1.9 MB |
+----------+----------+------------+------------+
Restoring etcd
- Create Restore Point from backup
sudo ETCDCTL_API=3 etcdctl snapshot restore /tmp/etcd-backup-new.db \
--data-dir /var/lib/etcd-backup
- Tell etcd to use new location
sudo vim /etc/kubernetes/manifests/etcd.yaml
- hostPath:
path: /var/lib/etcd-backup # Changed this ONLY!
type: DirectoryOrCreate
name: etcd-data
As far as i know, Kubelet restarts static Pods automatically. So, after a while everything seems good!
$ k get all -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system pod/coredns-6d4b75cb6d-6cmtm 1/1 Running 1 (7d23h ago) 72d
kube-system pod/coredns-6d4b75cb6d-wchss 1/1 Running 1 (7d23h ago) 72d
kube-system pod/etcd-master 1/1 Running 2 (7d23h ago) 72d
kube-system pod/kube-apiserver-master 1/1 Running 1 (7d23h ago) 39d
kube-system pod/kube-controller-manager-master 1/1 Running 4 (7d23h ago) 72d
kube-system pod/kube-proxy-mqzbd 1/1 Running 1 (7d23h ago) 72d
kube-system pod/kube-scheduler-master 1/1 Running 4 (7d23h ago) 72d
kube-system pod/weave-net-4xtwz 2/2 Running 3 (7d23h ago) 49d
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 72d
kube-system service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 72d
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kube-system daemonset.apps/kube-proxy 1 1 1 1 1 kubernetes.io/os=linux 72d
kube-system daemonset.apps/weave-net 1 1 1 1 1 <none> 49d
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
kube-system deployment.apps/coredns 2/2 2 2 72d
NAMESPACE NAME DESIRED CURRENT READY AGE
kube-system replicaset.apps/coredns-6d4b75cb6d 2 2 2 72d
The Probelem
So, it seems everything is fine but its not! e.g.
$ k run test --image nginx
Error from server: etcdserver: request timed out
or
$ k rollout restart daemonset.apps/kube-proxy -n kube-system
error: failed to patch: etcdserver: request timed out
What is my mistake?
P.S: Kubernetes version: v1.27.4