I’ve observed something really cool in Kubernetes.
When we submit a workload (like a Pod) to the API server, the controllers often mutate the workload — meaning they add or change certain fields automatically. One of those is the tolerations field.
For example, I noticed Kubernetes adds a default toleration like
key: "node.kubernetes.io/unreachable"
effect: "NoExecute"
tolerationSeconds: 300
This means: if the node becomes unreachable, the pod can tolerate it for 300 seconds before being evicted.
The controller-manager constantly watches the live health of nodes (via the API server). If the node recovers within that time, the pod stays. But if not, the toleration expires, and the eviction API removes the pod and reschedules it to a healthy node.
This is one of many default tolerations Kubernetes applies based on node taints and conditions — and it’s super powerful for automated fault recovery! this is best of use of taint & toleartion