Hi, I have a cluster with 3 master and 6 worker nodes and I am testing everything for Production level. Today I upgrade my k8s cluster from 1.30.11 to 1.31.8. after upgrade of master-node-3 I see error in etcd cluster. Master-node-3 etcd nodes was not able tojoin as reach request was rejected by master-node-1 and master-node2.
No idea what was the issue.
Finaly I decided to drain master 3 and clean it reset kubeadm and re join it. Now I face this issue.
kubeadm join 11.111.111.11:6443 --token czryde.wjsshdjshdjs0ex --discovery-token-ca-cert-hash sha256:f758f5e307kdjhckdhkjehkrjhucsbcknskchf9b689f5831b95 --control-plane --certificate-key 5356710a533f4dc3191faf07e7d2jhfhjgcgfdghfxdfdsd5dfef7e1a02655706348c
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks before initializing the new control plane instance
[WARNING FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists
[WARNING FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists
[WARNING FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists
[WARNING FileAvailable--etc-kubernetes-manifests-etcd.yaml]: /etc/kubernetes/manifests/etcd.yaml already exists
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action beforehand using 'kubeadm config images pull'
W0426 15:50:13.531549 2189292 checks.go:846] detected that the sandbox image "registry.k8s.io/pause:3.8" of the container runtime is inconsistent with that used by kubeadm.It is recommended to use "registry.k8s.io/pause:3.10" as the CRI sandbox image.
[download-certs] Downloading the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[download-certs] Saving the certificates to the folder: "/etc/kubernetes/pki"
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Using the existing "etcd/server" certificate and key
[certs] Using the existing "etcd/peer" certificate and key
[certs] Using the existing "apiserver-etcd-client" certificate and key
[certs] Using the existing "etcd/healthcheck-client" certificate and key
[certs] Using the existing "apiserver-kubelet-client" certificate and key
[certs] Using the existing "apiserver" certificate and key
[certs] Using the existing "front-proxy-client" certificate and key
[certs] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[certs] Using the existing "sa" key
[kubeconfig] Generating kubeconfig files
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/admin.conf"
[kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/controller-manager.conf"
[kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/scheduler.conf"
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[check-etcd] Checking that the etcd cluster is healthy
error execution phase check-etcd: error syncing endpoints with etcd: context deadline exceeded
To see the stack trace of this error execute with --v=5 or higher
I export the certificates from working master-node-1 and one thing I notice is this. .
root@k8s-master-3:/# ls -l /etc/kubernetes/manifests/
total 20
-rw------- 1 root root 4588 Απρ 26 15:50 kube-apiserver.yaml
-rw------- 1 root root 4100 Απρ 26 15:50 kube-controller-manager.yaml
-rw------- 1 root root 2169 Απρ 26 15:50 kube-scheduler.yaml
root@k8s-master-3:/# crictl ps
WARN[0000] runtime connect using default endpoints: [unix:///run/containerd/containerd.sock unix:///run/crio/crio.sock unix:///var/run/cri-dockerd.sock]. As the default settings are now deprecated, you should set the endpoint instead.
WARN[0000] image connect using default endpoints: [unix:///run/containerd/containerd.sock unix:///run/crio/crio.sock unix:///var/run/cri-dockerd.sock]. As the default settings are now deprecated, you should set the endpoint instead.
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID POD
root@k8s-master-3:/# crictl ps -a
WARN[0000] runtime connect using default endpoints: [unix:///run/containerd/containerd.sock unix:///run/crio/crio.sock unix:///var/run/cri-dockerd.sock]. As the default settings are now deprecated, you should set the endpoint instead.
WARN[0000] image connect using default endpoints: [unix:///run/containerd/containerd.sock unix:///run/crio/crio.sock unix:///var/run/cri-dockerd.sock]. As the default settings are now deprecated, you should set the endpoint instead.
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID POD
root@k8s-master-3:/#
This is no etcd.yaml file. I even copy form working master and change its IP address and still face the same issue. I am not able to find the core problem here. ChatGPT failed co-pilot failed as well and need help with this. Please help me out. . .
Regards,
Tauqeer.A