K8s HA cluster issue

fazal · December 4, 2025, 7:27pm

Hi,
I have a Kubernetes HA cluster with three control-plane nodes. However, when I take down master1, the whole cluster experiences issues and downtime. master2 and master3 do not continue operating as expected.
The cluster was set up using kubeadm, with a load balancer in front of the three control-plane nodes, and it uses the stacked etcd architecture.
Please, I need guidance from experienced Kubernetes engineers. thanks

Alistair_KodeKloud · December 4, 2025, 9:33pm

When the node dies, you don’t just lose an etcd member — you lose part of the API server fleet and therefore change the traffic pattern. The surviving two nodes must absorb both etcd and apiserver load increases simultaneously.

The kube-api servers on the dead node vanish
But the API servers on the remaining nodes now:
- Retry connections to the missing etcd member
- Block on timeouts
- Increase request latency
- Hold open gRPC retries
This borderline DDOSes the remaining etcd members with retry loops

Control plane components (scheduler, controller manager, other custom controllers you may have installed) perform Leader Election via the API Server

If API server responsiveness drops (because etcd is overloaded with retries):

Elections take longer
Leaders can time out
Controllers become “flappy”

This manifests as:

delayed pod scheduling
delayed node heartbeats
delayed controller actions

Even though etcd is technically alive, the control-plane stalls.

Final note

Having 3 control plane nodes is not a fully guaranteed HA. What it does give you in your setup is load balancing across multiple API servers so that one does not have to do all the work, which is good for clusters with many nodes and workloads. Better resilience may be obtained by using external etcd and running etcd on different hosts to the API servers.

Either way, losing an etcd member is something that needs to be fixed quickly. If etcd is separate to control plane, then it’s only etcd you have to fix, not necessarily the entire control plane node.