Can any one explain more on match under the routes and grouped by and matchers inside receiver

sumith.network · March 12, 2024, 8:21am

Hi ,
Am bit confused about the concept of grouped by and matchers inside receiver configuration of alert manager . Can i put matchers as my labels which i configured in the rule or should i configure labels inside the routes ? Could anyone clarify this ?
my rule and alert manager configuration as follows . my rule is working fine and it’s getting fired too

My rule

 - alert: low_disk_space
   expr: 100 * (1 - (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"})) > 90
   for: 1m
   labels:
     severity: critical
     env: eks
   annotations:
     message: "Less disk space on instance {{ .Label.instance }}"

routes:
- match:
env: eks
severity: warning|critical
receiver: ‘test-sumith’

  - receiver: test-sumith
      group_by: ["job","instance","alertgroup"]
      continue: true
	  matchers:
	     what matcher i have to give ?? and how this works?

  - name: test-sumith
    email_configs:
    - to: '[email protected]'
      send_resolved: true

Alistair_KodeKloud · March 12, 2024, 4:57pm

Matchers match labels in the source metrics

matchers:
- label1: value1
- label2: value2

etc.

Using this, you can choose different receivers for an alert based on the values of the labels in the metric firing the alert, e.g. some label that identifies the source as “dev” or “production”, the alert might want to go to different teams

There is a sample configuration here: https://github.com/prometheus/alertmanager/blob/main/doc/examples/simple.yml

sumith.network · March 16, 2024, 11:14am

Suppose if am using disk alert expression , such as
100 * (1 - (node_filesystem_avail_bytes{mountpoint=“/”} / node_filesystem_size_bytes{mountpoint=“/”}))

So, source metrics i should get it from the out put of the above query ,is it?

Alistair_KodeKloud · March 16, 2024, 8:16pm

An alert expression is nothing more than a PromQL query.

Yo should put it into the Prometheus UI to assert it returns the metric you expect. This will also show you the labels and values you can use in matcher expressions.

sumith.network · March 19, 2024, 6:54am

Yes, i have used the same labels , but instead of me its routing to the parent user . All the alerts are going to him, instead of me. That’s why am bit confused. Anyway i will give you the alert manager configuration and the alert which i have used .

sumith.network · March 19, 2024, 7:44am

groups:

name: automation-eks-node
rules:- alert: low_disk_space
expr: floor(100 * (1 - (node_filesystem_avail_bytes{mountpoint=“/”} / node_filesystem_size_bytes{mountpoint=“/”}))) > 40
for: 1m
labels:
severity: critical
env: eks
annotations:
summary: “Instance {{ $labels.instance }} is low on disk space”
description: “diskspace on {{ $labels.instance }} is used over {{ $value }}% .”

output of the prom query

{container=“node-exporter”, device=“/dev/xvda1”, endpoint=“http-metrics”, env=“eks”, fstype=“xfs”, hostname=“prometheus.org.cloud.sps”, instance=“48.15.192.10:9100”, job=“node-exporter”, mountpoint=“/”, namespace=“default”, owner=“automation_team”, pod=“kube-prometheus-prometheus-node-exporter-l9j64”, prometheus=“default/kube-prometheus-kube-prome-prometheus”, prometheus_replica=“prometheus-kube-prometheus-kube-prome-prometheus-0”, service=“kube-prometheus-prometheus-node-exporter”}

41

Here 41 is the value of disk

Alert Manager Configuration
---------------------------------- alertmanager_route:
group_by: [‘alertname’, ‘cluster’, ‘service’]
group_wait: 30s
group_interval: 5m
repeat_interval: 3h
receiver: “james-paul-direct”
routes:
# Define Routes here
- receiver: test-sumith
group_by: [‘alertname’, ‘cluster’, ‘service’]
continue: true
matchers:
- job = node-exporter
- alertname = low_disk_space
- hostname = prometheus.org.cloud.sps

This is my complete configuration. Could you check and let me know , what correction i have to do on this. Here most of the alerts which i configured is routing to james-paul-direct

Alistair_KodeKloud · March 19, 2024, 8:32pm

You have no label cluster in the given metric output.

Also, please paste YAML in

code blocks

or the formatting is lost and it does not make sense.