Hi ,
Am bit confused about the concept of grouped by and matchers inside receiver configuration of alert manager . Can i put matchers as my labels which i configured in the rule or should i configure labels inside the routes ? Could anyone clarify this ?
my rule and alert manager configuration as follows . my rule is working fine and it’s getting fired too
My rule
- alert: low_disk_space
expr: 100 * (1 - (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"})) > 90
for: 1m
labels:
severity: critical
env: eks
annotations:
message: "Less disk space on instance {{ .Label.instance }}"
routes:
- match:
env: eks
severity: warning|critical
receiver: ‘test-sumith’
- receiver: test-sumith
group_by: ["job","instance","alertgroup"]
continue: true
matchers:
what matcher i have to give ?? and how this works?
- name: test-sumith
email_configs:
- to: '[email protected]'
send_resolved: true
Matchers match labels in the source metrics
matchers:
- label1: value1
- label2: value2
etc.
Using this, you can choose different receivers for an alert based on the values of the labels in the metric firing the alert, e.g. some label that identifies the source as “dev” or “production”, the alert might want to go to different teams
There is a sample configuration here: https://github.com/prometheus/alertmanager/blob/main/doc/examples/simple.yml
Suppose if am using disk alert expression , such as
100 * (1 - (node_filesystem_avail_bytes{mountpoint=“/”} / node_filesystem_size_bytes{mountpoint=“/”}))
So, source metrics i should get it from the out put of the above query ,is it?
An alert expression is nothing more than a PromQL query.
Yo should put it into the Prometheus UI to assert it returns the metric you expect. This will also show you the labels and values you can use in matcher expressions.
Yes, i have used the same labels , but instead of me its routing to the parent user . All the alerts are going to him, instead of me. That’s why am bit confused. Anyway i will give you the alert manager configuration and the alert which i have used .
groups:
- name: automation-eks-node
rules:- alert: low_disk_space
expr: floor(100 * (1 - (node_filesystem_avail_bytes{mountpoint=“/”} / node_filesystem_size_bytes{mountpoint=“/”}))) > 40
for: 1m
labels:
severity: critical
env: eks
annotations:
summary: “Instance {{ $labels.instance }} is low on disk space”
description: “diskspace on {{ $labels.instance }} is used over {{ $value }}% .”
output of the prom query
{container=“node-exporter”, device=“/dev/xvda1”, endpoint=“http-metrics”, env=“eks”, fstype=“xfs”, hostname=“prometheus.org.cloud.sps”, instance=“48.15.192.10:9100”, job=“node-exporter”, mountpoint=“/”, namespace=“default”, owner=“automation_team”, pod=“kube-prometheus-prometheus-node-exporter-l9j64”, prometheus=“default/kube-prometheus-kube-prome-prometheus”, prometheus_replica=“prometheus-kube-prometheus-kube-prome-prometheus-0”, service=“kube-prometheus-prometheus-node-exporter”}
41
Here 41 is the value of disk
Alert Manager Configuration
---------------------------------- alertmanager_route:
group_by: [‘alertname’, ‘cluster’, ‘service’]
group_wait: 30s
group_interval: 5m
repeat_interval: 3h
receiver: “james-paul-direct”
routes:
# Define Routes here
- receiver: test-sumith
group_by: [‘alertname’, ‘cluster’, ‘service’]
continue: true
matchers:
- job = node-exporter
- alertname = low_disk_space
- hostname = prometheus.org.cloud.sps
This is my complete configuration. Could you check and let me know , what correction i have to do on this. Here most of the alerts which i configured is routing to james-paul-direct
You have no label cluster
in the given metric output.
Also, please paste YAML in
code blocks
or the formatting is lost and it does not make sense.