Worker node network failure monitoring

devsuv · May 5, 2025, 7:47am

Hi,

We’re looking for an efficient tool to monitor network connectivity between worker nodes—specifically to detect if a process on one worker node is unable to communicate with a process on another worker node. The tool should not be tied to any specific CNI (Container Network Interface). Are there any recommended tools or best practices for this kind of inter-node network monitoring? We found below tools suggested from internet but can you suggest a better tool ?

Tool	Real-Time Traffic	Visualization	Metrics	Policy Awareness
Cilium + Hubble
Istio + Kiali
Weave Scope
Pixie
Prometheus+Grafana	(depends)

Regards,
Debasis

Alistair_KodeKloud · May 5, 2025, 8:42am

Cilium or Istio are probably your best bet as they actively engage with all pods thus enabling real time traffic monitoring. Weave is a dead project (ran out of funding). Prometheus isn’t designed for traffic monitoring, but you should still use it to gather metrics from your chosen solution to monitor its health through those metrics.

What you’re wanting here is “tracing” from the pillars of observability which are

Logging (Elastic, datadog, splunk etc)
Monitoring (Prometheus, grafana)
Tracing (Hubble, Kiali)