Calico Networking Issue with New Pod Deployments on Worker Nodes in Kubernetes Cluster

Tauqeer-Ahamd · April 17, 2025, 9:14am

I have a Kubernetes (k8s) cluster with Calico set up as the network plugin. I’m experiencing a strange issue with pod deployments across my worker nodes:

Initially, worker node 5 had issues with pod deployment. When I checked, the Calico pod was running, the node was marked as ready, and no errors were shown. However, each new pod deployment on that node displayed the following error when describing the pod:

Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "...":
plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized

To resolve this, I:

Cordon the node.
Use OpenSSH to check the certificates (which were valid).
openssl x509 -in /var/lib/kubelet/pki/kubelet-client-current.pem -noout -dates
Delete the following files and directories

sudo rm -rf /var/lib/cni/*
sudo rm -rf /var/run/calico
sudo rm -rf /etc/cni/net.d/*.lock

Rejoin the node to the cluster.
Delete the Calico pod on that node so it restarts.
After following these steps, pod deployments resumed, and the new pods reached the running state.

Real Problem Started:
The problem is that right after fixing worker node 5, I faced the same issue with worker node 3. After resolving node 3, the issue occurred on worker node 2.

Key Observations:

Pods already running on the affected nodes continue to run without any issues.
The problem only affects new pod deployments when the node faces the issue.

Question: Has anyone encountered a similar issue or know the root cause? What could be causing this behaviour across multiple nodes, and how can I prevent it from recurring?

I really appreciate your help on this matter,
Thank to All Team.

Santosh_KodeKloud · April 17, 2025, 10:42am

What’s your cluster and Calico version?

The GitHub issues on Calico discuss this, and some users have resolved it by restarting the Calico Pods in the kube-system ns and indicating resource constraint for Calico Pods as the root cause.

Tauqeer-Ahamd · April 17, 2025, 11:07am

My Calico is calico/cni:v3.28.0
and K8sis Client Version: v1.30.3
Kustomize Version: v5.0.4-0.202306011
Server Version: v1.30.7 And yes my Calico pods are under kube-system NS.

Santosh_KodeKloud · April 17, 2025, 11:18am

Also, check the Age of your Calico Pods

Tauqeer-Ahamd · April 17, 2025, 11:39am

Does Age matter here. DO I have to reinstall Calico regularly?

root@k8s-master-2:~# kubectl get pods -n kube-system
NAME                                       READY   STATUS    RESTARTS       AGE
calico-kube-controllers-564985c589-b45v5   1/1     Running   1 (24d ago)    142d
calico-node-4trjl                          1/1     Running   0              4h25m
calico-node-74mkh                          1/1     Running   0              3h59m
calico-node-gltmk                          1/1     Running   0              103d
calico-node-hnbj4                          1/1     Running   3 (23d ago)    98d
calico-node-k75kt                          1/1     Running   0              3h47m
calico-node-qp24n                          1/1     Running   1 (24d ago)    103d
calico-node-qrktn                          1/1     Running   0              13d
calico-node-rf5sr                          1/1     Running   0              2d1h
calico-node-trh6k                          1/1     Running   1 (24d ago)    142d

Neeljy · April 21, 2025, 5:31pm

Tauqeer-Ahamd:

I have a Kubernetes (k8s) cluster with Calico set up as the network plugin. I’m experiencing a strange issue with pod deployments across my worker nodes:

Initially, worker node 5 had issues with pod deployment. When I checked, the Calico pod was running, the node was marked as ready, and no errors were shown. However, each new pod deployment on that node displayed the following error when describing the pod:
Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "...":
plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
To resolve this, I:

Cordon the node.

Use OpenSSH to check the certificates (which were valid).
openssl x509 -in /var/lib/kubelet/pki/kubelet-client-current.pem -noout -dates

Delete the following files and directories
sudo rm -rf /var/lib/cni/*
sudo rm -rf /var/run/calico
sudo rm -rf /etc/cni/net.d/*.lock
Rejoin the node to the cluster.

Delete the Calico pod on that node so it restarts.
After following these steps, pod deployments resumed, and the new pods reached the running state.

Real Problem Started:
The problem is that right after fixing worker node 5, I faced the same issue with worker node 3. After resolving node 3, the issue occurred on worker node 2.

Key Observations:

Pods already running on the affected nodes continue to run without any issues.

The problem only affects new pod deployments when the node faces the issue.

Question: Has anyone encountered a similar issue or know the root cause? What could be causing this behaviour across multiple nodes, and how can I prevent it from recurring?

I really appreciate your help on this matter,
Thank to All Team.

Hi! It looks like Calico is losing access to the Kubernetes API on affected nodes — the unauthorized error suggests a service account or token issue.

This might be due to:

Expired or missing service account tokens in Calico pods
RBAC issues or missing ClusterRoleBindings
Kubelet or network stack glitches on specific nodes

Suggestions:

Check Calico pod’s service account token on affected nodes
Review RBAC permissions for Calico
Recreate the Calico DaemonSet across all nodes
Check kube-apiserver logs for auth errors
Consider updating Kubernetes and Calico if not already on latest versions

You’ve done great troubleshooting — looks like a systemic token/auth issue that appears node-by-node. Hope this helps!