Multi-cluster monitoring using prometheus

Hello guys,
I have a task to configure centrlized monitoring solution for multi-cluster using prometheus. and i’m looking for best practices and solution for it. if there is anyone could help i would be thankful.

There are probably several ways to do this, however one way I have done it is this

  • Deploy a Prometheus into each cluster. Use Prometheus Operator to do this via its helm chart, and tune the helm chart to only install the Prometheus components. Configure these instances to collect all the metrics from within each cluster that you want to centralize. This should collect everything from metrics server and kube state metrics for its cluster, plus any metrics from specific applications. Put a fairly short retention policy on these to keep memory usage down. Ensure to drop useless high cardinality labels like pod IDs unless you specifically need them for something.
  • In each cluster also deploy node exporter as a daemonset so you can collect node stats.
  • Deploy a “central” Prometheus and set it up to federate from the Prometheuses in each cluster. The federation will apply a label to all the federated metrics indicating which Prometheus (i.e. cluster), the metrics came from so you can tell them apart in the central instance. This instance will need much more memory since it’s collecting the metrics for everything and likely retaining them for longer. If you need long-term storage of metrics, consider adding Thanos to this instance.

Start with one cluster and the central instance. Once you have the correct metrics being federated from one cluster, you can copy the setup to the other clusters.

Hello,
I really appreciate your reply thanks a lot.

I have another question, what about the security ? should i configure mTls between all prometheus instances?

Depends on the security posture of your workplace, and how valuable you believe metric data to be, and if your clusters communicate over private networks. There’s no text data in metrics other than metric and label names so you’re not exposing passwords, unless you have hard coded passwords in pods (you shouldn’t) and kube state metrics returns environment variable info.

You can use unsecured http, regular SSL where you only have certificates at the server end (cluster Prometheuses) or full mTLS. If any route between clusters crosses the public internet then you must have at least regular SSL.

Hello it’s me again, tried to do what you told me but using remote-write option instead of federated one, and using prometheus agent mode to send metrics to the centralized prometheus.

I used kube-prometheus-stack chart version 55.5.1, but i got a problem when i try to enable the remote-write-reciever option in the centralized prometheus instance,

i passed these values to the values.yaml:

prometheus:
   prometheusSpec:
       enableRemoteWriteReceiver: true
       enableFeatures:
          - remote-write-receiver

but when i check the flags in prometheus ui i find it false, also when i try to curl using post to the endpoint i get this error " remote write receiver needs to be enabled with --web.enable-remote-write-receiver"
any suggestions ?

i fixed the issue, i have duplicated prometheus sections so enableRemoteWriteReceiver: true was overriden.

Hey @Siradj-Eddine-Fisli,

Have you fully setup the multi-cluster prometheus?
I have tried setting up prometheus in agent mode in two different cluster; one as master prometheus server and other as in agent mode.

I have few questions:

  1. How does the authorization between agent and master prometheus server works?
  2. How do we specificaaly create get metrics from a agent cluster?

TIA!