Monitor the infrastructure using the Grafana-Prometheus stack

Once the applications have been deployed in the production environment, it is very crucial to manage and maintain these applications. Proper management of these applications assure that they are running as expected and helps in the prevention of any application downtime. Along with the applications, it is also crucial to maintain the hardware on top of which these applications have been deployed. A complete monitoring stack is essential to monitor the components in place and helps in debugging if any issues persist. A monitoring stack should be able to monitor the clusters with real-time dashboards that visualize the most important KPIs. It should also be able to alert the stakeholders in case of an issue or if a certain threshold has been breached. Not using a monitoring stack can lead to serious production issues. For example, the performance of the applications can degrade if the hardware is being heavily utilized. Grafana-Prometheus stack is one of the monitoring stacks that can be used for monitoring the systems and alerting. It can be deployed over the cloud as well as can be deployed as an on-premise solution.
What is Prometheus? How does it work?
Prometheus is an open-source monitoring tool that stores data in time-series format. It pulls the metrics from different exporters and stores these metrics as time series. Each time series can be uniquely identified using a combination of metric names and labels that are in key-value pairs. The labels help in differentiating the time series having the same metric name.
For example, you have configured Prometheus to scrape the metrics of the node exporter that has been running on two different nodes. The metric names sent by the node exporter will be the same, but the corresponding labels will be different as it runs on different nodes. For collecting these metrics, targets need to be defined in the configuration file of Prometheus. Each target corresponds to the exporters that provide the metrics. There are many exporters available in the market that allow better monitoring of the services and infrastructure. Prometheus also provides a query language called PromQL that enables querying over the metrics that have been fetched. It allows aggregation over time-series data using the pre-built functions.
What is Grafana? How does it work?
Grafana is a data visualization tool that helps in monitoring the state of the infrastructure and applications. It can query over the data, create dashboards, and send alerts to different channels. The dashboards offer better visibility of the data and help in avoiding production issues by monitoring the system at all times. Custom plugins can be used to integrate Grafana with other data sources and data visualizations. Grafana connects with different databases like Prometheus, Elasticsearch, InfluxDB, MySQL, PostgreSQL, Graphite, etc. After connecting to these data sources, the dashboards can be created or imported from different sources. These dashboards can also be restricted to users with certain rights through user access management. Apart from the open-source solution, Grafana provides Cloud and Enterprise solutions.
What is Alertmanager? How does it work?
Alertmanager is an alerting tool that talks to Prometheus and sends the alerts to different channels. Some of these channels include PagerDuty, Email, Telegram, Slack, etc. We can also send these alerts to other platforms like Teams using webhooks. These webhooks need to be created at the channel and the URL needs to be provided in the Alertmanager configuration. The rules that need to be monitored are provided in the Prometheus configuration. When one of these rules gets triggered, Prometheus conveys the trigger to Alertmanager. Alertmanager will then send the alert to the channel to notify the users about the rule trigger. Alertmanager allows the alerts to be silenced to prevent notifying about the same rule triggers repeatedly.
How to set up and use the monitoring stack
One of the easiest ways to set up the monitoring stack is through a docker-compose file. In the docker-compose file, we will set up the following four services:
- Prometheus - It will fetch the metrics from different targets mentioned in the configuration file. The configuration needs to be changed if more targets need to be monitored.
- Grafana - To visualize the dashboards. It is pre-configured with Prometheus as a data source. To add more data sources, you can edit the datasources.yml file. After importing the node exporter dashboard, you shall be able to see the following dashboard on Grafana UI:
- Alertmanager - It sends the alerts to the different channels. The details specific to the channels like user details, webhook URLs, etc need to be entered in the configuration file.
- Node exporter - It is an exporter that provides the metrics of the host like CPU, Memory, Mountpoint storage, etc. It needs to be deployed on each of the systems that are to be monitored.
version: "3.7"
services:
prometheus:
image: prom/prometheus:v2.36.0
volumes:
- ./prometheus/prometheus.yaml:/etc/prometheus/prometheus.yml
- ./prometheus/rules.yml:/etc/prometheus/rules.yml
- prometheus_data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
ports:
- 9090:9090
networks:
- monitoring-stack
grafana:
image: grafana/grafana:9.0.0-beta2
volumes:
- ./grafana/grafana.ini:/etc/grafana/grafana.ini
- ./grafana/datasources.yml:/etc/grafana/provisioning/datasources/datasource.yaml
ports:
- 3000:3000
networks:
- monitoring-stack
node-exporter:
image: prom/node-exporter:v1.3.1
restart: unless-stopped
ports:
- 9100:9100
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
command:
- '--path.procfs=/host/proc'
- '--path.rootfs=/rootfs'
- '--path.sysfs=/host/sys'
- '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
networks:
- monitoring-stack
alertmanager:
image: prom/alertmanager:v1.3.1
ports:
- 9093:9093
volumes:
- ./alertmanager/config.yml:/etc/alertmanager/config.yml
networks:
- monitoring-stack
restart: always
command:
- '--config.file=/etc/alertmanager/config.yml'
- '--storage.path=/alertmanager'
networks:
monitoring-stack:
driver: bridge
volumes:
prometheus_data:
Steps to deploy the monitoring stack:
As a part of the pre-requisite, you will need to install Docker, Docker-compose, and Git on the systems. These are required to deploy the services.
- Clone the git repository using the below command. This repository contains all the required artifacts for deploying the monitoring stack.
git clone https://github.com/thakarprathamesh/monitoring-stack.git
- Start the services by using the compose file. You can edit the compose file as required. Use the below command to start the services.
docker-compose -f monitoring-docker-compose.yaml up -d
- You can access the services at the following URLs:
- Grafana - http://localhost:3000 (Enter default credentials of Grafana i.e username - admin, password - admin to access the UI)
- Prometheus UI at http://localhost:9090
- Alertmanager UI at http://localhost:9000
Try Grafana-Prometheus using KodeKloud
You can also access Grafana and Prometheus instantly via KodeKloud. It provides a playground for the users to get hands-on with the Grafana-Prometheus stack. The entire list of playgrounds is listed here. KodeKloud provides learning paths and courses related to the tools involved in the blog like Docker, Git, and many more. The users can access over 30+ courses and 35+ playgrounds with a subscription to the KodeKloud platform.