I have a Kubernetes (k8s) cluster with Calico set up as the network plugin. I’m experiencing a strange issue with pod deployments across my worker nodes:
Initially, worker node 5 had issues with pod deployment. When I checked, the Calico pod was running, the node was marked as ready, and no errors were shown. However, each new pod deployment on that node displayed the following error when describing the pod:
Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "...":
plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
To resolve this, I:
Cordon the node.
Use OpenSSH to check the certificates (which were valid). openssl x509 -in /var/lib/kubelet/pki/kubelet-client-current.pem -noout -dates
Delete the Calico pod on that node so it restarts.
After following these steps, pod deployments resumed, and the new pods reached the running state.
Real Problem Started:
The problem is that right after fixing worker node 5, I faced the same issue with worker node 3. After resolving node 3, the issue occurred on worker node 2.
Key Observations:
Pods already running on the affected nodes continue to run without any issues.
The problem only affects new pod deployments when the node faces the issue.
Question: Has anyone encountered a similar issue or know the root cause? What could be causing this behaviour across multiple nodes, and how can I prevent it from recurring?
I really appreciate your help on this matter,
Thank to All Team.
The GitHub issues on Calico discuss this, and some users have resolved it by restarting the Calico Pods in the kube-system ns and indicating resource constraint for Calico Pods as the root cause.
My Calico is calico/cni:v3.28.0
and K8sis Client Version: v1.30.3
Kustomize Version: v5.0.4-0.202306011
Server Version: v1.30.7 And yes my Calico pods are under kube-system NS.