Toleration behaviour in k8s pod for tolerate in node

amaan · April 8, 2025, 3:00pm

I’ve observed something really cool in Kubernetes.

When we submit a workload (like a Pod) to the API server, the controllers often mutate the workload — meaning they add or change certain fields automatically. One of those is the tolerations field.

For example, I noticed Kubernetes adds a default toleration like

key: "node.kubernetes.io/unreachable"
effect: "NoExecute"
tolerationSeconds: 300

This means: if the node becomes unreachable, the pod can tolerate it for 300 seconds before being evicted.

The controller-manager constantly watches the live health of nodes (via the API server). If the node recovers within that time, the pod stays. But if not, the toleration expires, and the eviction API removes the pod and reschedules it to a healthy node.

This is one of many default tolerations Kubernetes applies based on node taints and conditions — and it’s super powerful for automated fault recovery! this is best of use of taint & toleartion

raymond.baoly · April 9, 2025, 10:04am

Hi,

Thanks for sharing, that was really helpful. I ran into the same issue, when a node goes down, the pods on it don’t stop right away and seem to stay in their current state. I thought the master node wasn’t acting fast enough, but now I understand it better. Really appreciate it!

amaan · April 10, 2025, 3:20pm

ohh that sound great only explore k8s thats jungle for exploration read my latest that it also about toleration behaviour of control plane