Start your free 14-day ContainIQ trial

Prometheus Alertmanager | Use Cases and Tutorial

Prometheus Alertmanager makes your monitoring more useful by keeping you informed of what’s happening and by providing actionable insights. This article will show you how to use it more effectively with best practices and examples.

March 13, 2023
Daniel Olaogun
Software Engineer

While there are many tools you can use for monitoring and alerting, Prometheus has become a default monitoring tool for many companies. Prometheus is a free, open source tool that is commonly used for monitoring containerized applications and Kubernetes clusters. It provides a detailed analysis of your cluster and performs well even under heavy workloads. By integrating it with Grafana, you can visualize your data metrics and gain deep insight into your cluster.

Prometheus provides insight into your Kubernetes cluster and containerized application, but you also want to be notified when any problem occurs. This is where Prometheus Alertmanager comes in. In this article, you’re going to learn about Prometheus Alertmanager, its benefits, how to implement it, and some tips and tricks on how to use it to extract the maximum benefit from your monitoring.

What is Prometheus Alertmanager?

Prometheus alerts allow you to receive automated notifications when certain events occur, triggered by your metric data.

Alertmanager will send notifications to your configured notification channels when your Kubernetes cluster meets pre-configured events and metrics rules in the Prometheus server. These rules can include sending critical messages when an unhealthy node is detected, or sending a warning message when node resource consumption is reaching its limit. You can configure the Alertmanager to send event notifications to a number of channels, including email, Slack, webhook, and other common platforms. Furthermore, Alertmanager offers the ability to group related alerts into a single notification, and to suppress alerts triggered by problems for which notifications have already been sent.

It is easy to install and configure Prometheus Alertmanager into your cluster using the kube-prometheus-stack helm charts. Once installed, the Alertmanager can be configured via command-line/terminal, in conjunction with a configuration file that contains the notification configuration rules.

Use Cases for Using Prometheus Alertmanager

Prometheus Alertmanager has many use cases. Let’s take a deeper look at a few of them considered to be best practices.

Resource Approaching Capacity Limit

It’s important to know when the resources in your Kubernetes cluster are reaching their limits, since failure to know this can affect the performance of your application on the cluster. For example, if your cluster nodes are reaching their memory limit, the nodes will start evicting pods from their environment until they’ve reduced the number of pods to something their memory can handle. The eviction has a snowball effect on your application because you no longer have sufficient pods in your cluster to handle the traffic being generated by your users, thereby affecting the performance of your application.

You can configure Alertmanager to send notifications if your Kubernetes resources are approaching a limit, allowing you to take quick action to increase your resources or kill unnecessary jobs so as to free up resources.

Processing Jobs Not Running

Jobs in Kubernetes are designed to run to completion, and if a failure occurs during the job processing, Kubernetes re-runs the job until it is successful. However, there are scenarios that cause a job to continually fail—for example, incorrect Docker configurations, or insufficient resources to complete the job. Alertmanager is able to notify you if a job fails repeatedly, indicating a larger problem.

Nodes in a NotReady State

The status of your worker nodes is very important to the performance and overall cost of your cluster. If one or more of your nodes are in a <terminal inline>NotReady<terminal inline> state, it has a negative effect on your cluster performance. For instance, when a node is in <terminal inline>NotReady<terminal inline>, the <terminal inline>kube-scheduler<terminal inline> won’t be able to schedule pods on it, but the node is still adding to the overall cost of your cluster and consuming resources. Receiving alerts when nodes are in the <terminal inline>NotReady<terminal inline> state will enable you to take steps to fix the issue.

K8s Metrics, Logging, and Tracing
Monitor the health of your cluster and troubleshoot issues faster with pre-built dashboards that just work.
Start Free Trial Book a Demo

Implementing Prometheus Alertmanager From Scratch

Now that you know a little about how Prometheus Alertmanager can help you, let’s walk through its implementation.

Create a Kubernetes Cluster

Before you can use the Prometheus Alertmanager, you need to have a Kubernetes cluster up and running. You can create a local development cluster using Minikube by following this guide. You can also set up a self-managed cluster, or use a managed Kubernetes cluster provided by cloud providers such as Amazon Web Services’ EKS, Azure’s AKS, or Google Cloud Platform’s GKE. Each of the cloud providers explains how to set up a Kubernetes Cluster in their documentation guide.

Install Helm

Helm is a package manager for Kubernetes, and is an important prerequisite for installing Prometheus into your cluster. Select and run the command for the operating system of your cluster.

Windows


choco install kubernetes-helm

Linux


curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3

chmod 700 get_helm.sh

./get_helm.sh

MacOS


brew install helm

To confirm that you have successfully installed Helm, run the following command to check the version of Helm on your terminal:


helm version 

Below is the Helm version as at the time of writing this article. You should have a similar response.


version.BuildInfo{Version:"v3.7.2", GitCommit:"663a896f4a815053445eec4153677ddc24a0a361", GitTreeState:"clean", GoVersion:"go1.17.3"}

Install Prometheus and Alertmanager

Once you have completed the installation of the package manager, you can use it to install Prometheus. Run the following commands:


helm repo add prometheus-community https://prometheus-community.github.io/helm-charts

helm repo add stable https://charts.helm.sh/stable

helm repo update

You will install the Prometheus stack in a new namespace so you don’t clutter your default namespace. Create a <terminal inline>monitoring<terminal inline> namespace by running the command below:


kubectl create ns monitoring

Then use Helm to install the <terminal inline>kube-prometheus-stack<terminal inline>


helm install prometheus prometheus-community/kube-prometheus-stack -n monitoring

The <terminal inline>kube-prometheus-stack<terminal inline> automatically creates deployments, pods, and service objects for Prometheus on your cluster. To view them, run <terminal inline>kubectl get all<terminal inline>.

For this tutorial, you’ll use <terminal inline>port forward<terminal inline> to access Prometheus and Alertmanager web UI.

The Prometheus server has a web UI that shows simple graphs, current configuration rules, and the state of the monitoring endpoints, but you can’t configure alerting rules. To view the Prometheus web interface, run the following command:


kubectl port-forward prometheus-prometheus-kube-prometheus-prometheus-0 9090 -n monitoring

Visit http://localhost:9090 to view Prometheus’s web UI.

Prometheus Web UI
Prometheus Web UI

Prometheus also has an Alertmanager web UI, which is primarily used for viewing alerts and managing silences. Note that you can’t configure your alert rules there.


kubectl port-forward alertmanager-prometheus-kube-prometheus-alertmanager-0 9093 -n monitoring

Visit http://localhost:9093/#/alerts to view in browser

Prometheus Alertmanager UI
Prometheus Alertmanager UI

As you can see, Alertmanager comes with numerous alert rules provided out of the box. Prometheus triggers these alerts automatically when the expression rules match the events occurring in the Prometheus server.

To configure your Alertmanager, you have to create a configuration YAML file, then use Helm to apply the configurations.

Configuring Alertmanager to Send Emails

Create a file called <terminal inline>alertmanager-config.yaml<terminal inline> with the following contents.


alertmanager:
  config:
    global:
      resolve_timeout: 5m
    route:
      group_wait: 20s
      group_interval: 4m
      repeat_interval: 4h
      receiver: 'email-k8s-admin'
      routes: []
    receivers:
    - name: 'email-k8s-admin'
      email_configs:
      - to: k8s-admin@example.com
        from: email-k8s-admin@alertmanager.com
        smarthost: mail.example.com:587
        auth_username: email-k8s-admin
        auth_password: xxxxxxxxx

For email testing, you can create a test email on MailTrap to avoid cluttering your default email box with Alertmanager test notifications.

Once you’ve updated the above code with valid SMTP configurations, you’ll use <terminal inline>Helm upgrade<terminal inline> to deploy the new configurations


helm upgrade --reuse-values -f alertmanager-config.yaml prometheus prometheus-community/kube-prometheus-stack -n monitoring

Helm will deploy the new configurations for the Alertmanager.

Once the Alertmanager has the updated email configurations, you will receive all alerts that have been pending in the Alertmanager, such as the <terminal inline>watchdog<terminal inline> alert seen below.

Email Alert
Email Alert

Alert Options

In addition to the email option, there are several other options for receiving alerts. You can configure your Alertmanager to send notifications via Slack, WeChat, webhooks, and other services. Here’s an example of a Slack configuration for receiving notifications directly to your Slack account.


alertmanager:
  config:
    global:
      resolve_timeout: 5m
    route:
      group_wait: 20s
      group_interval: 4m
      repeat_interval: 4h
      receiver: 'null'
      routes:
      - receiver: 'slack-k8s-admin'
        matchers:
        - severity="critical" 
    receivers:
      - name: 'null'
      - name: 'slack-k8s-admin'
        slack_configs:
         - api_url: 'https://hooks.slack.com/services/000000000/00000000/00000000000000'
            channel: '#my-kubernetes-cluster'

You can also configure your Alertmanager to send notifications to multiple channels based on the alert properties, such as severity, alertname, or job. In the example below, Alertmanager sends all alerts with <terminal inline>critical<terminal inline> severity to the Slack channel, while alerts with <terminal inline>warning<terminal inline> severity are sent to the email channel


alertmanager:
  config:
    global:
      resolve_timeout: 5m
    route:
      group_wait: 20s
      group_interval: 4m
      repeat_interval: 4h
      receiver: 'null'
      routes:
      - receiver: 'slack-k8s-admin'
        matchers:
        - severity="critical"    
      - receiver: 'email-k8s-admin'
        matchers:
        - severity="warning" 
    receivers:
      - name: 'null'
      - name: ‘email-k8s-admin’
        email_configs:
        - to: k8s-admin@example.com
          from: email-k8s-admin@alertmanager.com
          smarthost: mail.example.com:587
          auth_username: email-k8s-admin
          auth_password: xxxxxxxxx
      - name: ‘slack-k8s-admin’
        slack_configs:
        - api_url: ‘https://hooks.slack.com/services/000000000/00000000/00000000000000’
          channel: ‘#my-kubernetes-cluster’

For more configuration information, visit the Prometheus Alert Configuration Page

Triggering Alerts

Outside of the default rules provided by Prometheus, Alertmanager will also trigger based on the alert rules you configured. For example, the <terminal inline>WatchDog<terminal inline> alert we saw earlier is a default rule, and Alertmanager automatically fired an alert because the <terminal inline>expression<terminal inline> matched the event occurring in the Prometheus server.

To test your Alertmanager configuration, you’ll create some custom rules.

Configuring Alert Rules

Create a yaml file named <terminal inline>alert-rules.yaml<terminal inline> and add the following custom rules:


additionalPrometheusRulesMap:
 custom-rules:
  groups:
  - name: GroupA
    rules:
    - alert: InstanceLowMemory
      expr: :node_memory_MemAvailable_bytes:sum < 50668858390
      for: 1m
      labels:
        severity: critical
      annotations:
        summary: “Instance {{ $labels.host }}: memory low”
        description: “{{ $labels.host }} has less than 50G memory available”
    - alert: InstanceDown
      expr: up == 0
      for: 1m
      labels:
        severity: critical
      annotations:
        summary: “Instance [{{ $labels.instance }}] down”
        description: “[{{ $labels.instance }}] of job [{{ $labels.job }}] has been down for more than 1 minute.”

Apply your alert rules:


helm upgrade --reuse-values -f alert-rules.yaml prometheus prometheus-community/kube-prometheus-stack -n monitoring

Once you apply your alert rules, the Alertmanager will automatically send a notification of <terminal inline>InstanceLowMemory<terminal inline> to your notification channels, as seen below.

InstanceLowMemory
InstanceLowMemory

You can also view the rules in your Prometheus web UI and Alertmanager web UI.

Prometheus web UI
Prometheus web UI
AlertManager web UI
AlertManager web UI

There are custom rules that can be created, such as node out of memory and storage out of space.

Final Thoughts

Prometheus Alertmanager is an important tool in your Kubernetes arsenal, and in this article, you’ve learned about some of the benefits of Alertmanager. It notifies you of specific events occurring in your Kubernetes cluster so that you can take action as needed to preserve the integrity of your cluster. Furthermore, you also learned how to install and configure Prometheus, Alertmanager in your Kubernetes cluster. Finally, you learned how to configure rules so that Alertmanager notifies you of specific events happening in your cluster.

Start your free 14-day ContainIQ trial
Start Free TrialBook a Demo
No card required
Daniel Olaogun
Software Engineer

Daniel is a Software Engineer with 5 years of experience building software solutions in a variety of fields. Today, he works as the Lead Software Engineer at Whitesmith. Daniel has a Bachelor’s Degree in Computer Science from the University of Lagos.

READ MORE