Resources

Prometheus Queries: 11 PromQL Examples and Tutorial

November 30, 2021

With any monitoring system it’s important that you’re able to pull out the right data. In Prometheus pulling data is done via PromQL queries and in this article we guide the reader through 11 examples that can be used for Kubernetes specifically.

Vinayak Pandey
Senior Systems Engineer

Prometheus is an open-source monitoring and alerting software that can collect metrics from different infrastructure and applications. It saves these metrics as time-series data, which is used to create visualizations and alerts for IT teams. Prometheus has gained a lot of market traction over the years, and when combined with other open-source tools like Grafana, it provides a robust monitoring solution.

You can query Prometheus metrics directly with its own query language: PromQL. In this article, you will learn some useful PromQL queries to monitor the performance of Kubernetes-based systems.

But before that, let’s talk about the main components of Prometheus.

Prometheus Components and PromQL

The main components of Prometheus are:

  • The Prometheus server: The main application server that collects and stores metrics and sends alerts.
  • Push gateway: This is used to send metrics from short-lived jobs to Prometheus.
  • Exporters: These are like “agents" that expose metrics from a wide variety of applications, infrastructure, APIs, databases, and other sources. The Prometheus server scrapes their endpoints and saves the data as metrics.
  • Alert manager: The alert manager interfaces between the Prometheus server and receiving endpoints like email and PagerDuty. The alert manager also takes care of duplicating, grouping, routing, and silencing alerts.

Other Prometheus components include a data model that stores the metrics, client libraries for instrumenting code, and PromQL for querying the metrics. PromQL allows you to write queries and fetch information from the metric data collected by Prometheus. You can use these queries in the expression browser, Prometheus HTTP API, or visualization tools like Grafana.

Structure of a PromQL Query

The simplest construct of a PromQL query is an instant vector selector. This selector is just a metric name. PromQL queries the time series data and returns all elements that match the metric name, along with their values for a particular point in time (when the query runs). Here are two examples of instant vectors:

  • <terminal inline>node_cpu_seconds_total<terminal inline>: This returns the total amount of CPU time.
  • <terminal inline>instance_memory_usage_bytes<terminal inline>: This shows the current memory used.

You can also use range vectors to select a particular time range. For example, the following query will show the total amount of CPU time spent over the last two minutes:


node_cpu_seconds_total{cpu="0"}[2m]

And the query below will show the total number of HTTP requests received in the last five minutes:


http_requests_total[5m]

There are different ways to filter, combine, and manipulate Prometheus data using operators and further processing using built-in functions. Of course, this article is not a primer on PromQL; you can browse through the PromQL documentation for more in-depth knowledge.

Setting Up a Kubernetes Cluster

Let’s create a demo Kubernetes cluster and set up Prometheus to monitor it. In the following steps, you will create a two-node Kubernetes cluster (one master and one worker) in AWS. I’ve deliberately kept the setup simple and accessible from any address for demonstration.

Step 1: Launch EC2 Instances

In AWS, create two <terminal inline>t2.medium<terminal inline> instances running CentOS. Name the nodes as Kubernetes Master and Kubernetes Worker.

Specify CentOS AMI
Specify CentOS AMI

Specify instance type
Specify instance type

Specify EC2 instance details
Specify EC2 instance details

Step 2: Create a Security Group

Next, create a Security Group to allow access to the instances.

Specify Security Group settings
Specify Security Group settings

Once configured, your instances should be ready for access.

Kubernetes nodes
Kubernetes nodes

Step 3: Install Docker

SSH into both servers and run the following commands to install Docker


sudo yum install -y yum-utils device-mapper-persistent-data lvm2
sudo yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
sudo yum install -y docker
sudo systemctl start docker
sudo systemctl enable docker
sudo systemctl status docker

Step 4: Configure Kubernetes Repository

Run the following commands in both nodes to configure the Kubernetes repository.


sudo vi /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg

Step 5: Install kubelet,kubeadm and kubectl

Run the following commands in both nodes to install kubelet, kubeadm, and kubectl.


sudo yum install -y kubelet kubeadm kubectl
sudo systemctl enable kubelet
sudo systemctl start kubelet

Step 6: Set Hostnames

Set the hostname in both nodes.


sudo hostnamectl set-hostname master
sudo hostnamectl set-hostname worker

Step 7: Configure the hosts Files

In both nodes, edit the /etc/hosts file to add the private IP of the nodes.


master <master_node’s private ip>
worker <worker_node’s private ip>

Step 8: Update IPTables

In both nodes, edit the /etc/sysctl.d/k8s.conf file to add the following two lines:


net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1

Then reload the IPTables config using the <terminal inline>sudo sysctl --system<terminal inline> command.

Step 9: Disable SELinux and Swapping

Run the following commands in both nodes to disable SELinux and swapping:


sudo setenforce 0
sudo sed -i '/swap/d' /etc/fstab
sudo swapoff -a

Also, change <terminal inlne>SELINUX=enforcing<terminal inlne> to <terminal inlne>SELINUX=permissive<terminal inlne> in the <terminal inlne>/etc/selinux/config<terminal inlne> file.

Step 10: Install Kubernetes

Now, let’s install Kubernetes on the master node using kubeadm. Run the following command on the master node:


sudo kubeadm init --pod-network-cidr=<CIDR_RANGE>

Once the command runs successfully, you’ll see joining instructions to add the worker node to the cluster.

kubeadm output showing node joining instructions
kubeadm output showing node joining instructions

Step 11: Copy kubeconfig

Run the following commands on the master node, only copy the kubeconfig and set up Flannel CNI. We’ll be executing kubectl commands on the master node only.


mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
sudo kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

Step 12: Configure Worker Node

On the worker node, run the kubeadm joining command shown in the last step.


sudo kubeadm join 172.31.20.200:6443 --token 9bsbf8.igewpclzaf9sgpcj \
    --discovery-token-ca-cert-hash sha256:ff6c79678ed8542c4c1faa26697e1f5d948c74bdc471198d7ec30237f77289c1 
Note that for your use case, the token values will be different.

Step13: Check Cluster Status

At this point, both nodes should be ready. You can verify this by running the kubectl get nodes command on the master node.

Kubernetes cluster node status
Kubernetes cluster node status

Setting Up Prometheus

Run the following commands on the master node to set up Prometheus on the Kubernetes cluster:


sudo yum install -y git
git clone https://github.com/prometheus-operator/kube-prometheus.git
cd kube-prometheus
kubectl create -f manifests/setup
kubectl create -f manifests/
kubectl get pods -n monitoring

Next, run this command on the master node to check the Pods’ status:


kubectl get pods -n monitoring

Prometheus Pods’ status
Prometheus Pods’ status

Once all the Pods are up and running, you can access the Prometheus console using kubernetes port forwarding. To do that, run the following command on the master node:


kubectl --namespace monitoring port-forward svc/prometheus-k8s 9090

Next, create an SSH tunnel between your local workstation and the master node by running the following command on your local machine:


ssh -i <path_to_pem_file> -L 9090:127.0.0.1:9090  centos@<master_nodes_public_ip>

If everything is okay at this point, you can access the Prometheus console at <terminal inline>http://localhost:9090<terminal inline>.

Prometheus console
Prometheus console

11 Queries | Kubernetes Metric Data with PromQL

Now comes the fun stuff. You can run a variety of PromQL queries to pull interesting and actionable metrics from your Kubernetes cluster. These queries will give you insights into node health, Pod health, cluster resource utilization, etc.

Of course there are many types of queries you can write, and other useful queries are freely available. However, the queries you will see here are a “baseline" audit. These will give you an overall idea about a cluster’s health.

You’ll be executing all these queries in the Prometheus expression browser, so let’s get started.

Query 1: Find Number of Pods per Namespace


sum by (namespace) (kube_pod_info)

Number of pods per namespace
Number of pods per namespace

Query 2: Find CPU Overcommit

Before running this query, create a Pod with the following specification:


apiVersion: v1
kind: Pod
metadata:
	name: dummy-pod
spec:
	containers:
	- name: dummy-pod
		image: ubuntu
		resources:
			limits:
				cpu: "4"
				memory: "8000MI"
			requests:
				cpu: "4"
				memory: "8000MI"
	restartPolicy: Always

Now run the following query:


sum(kube_pod_container_resource_limits{resource="cpu"}) - sum(kube_node_status_capacity{resource="cpu"})

If this query returns a positive value, then the cluster has overcommitted the CPU.

CPU overcommit
CPU overcommit

Query 3: Memory Overcommit


sum(kube_pod_container_resource_limits{resource="memory"}) - sum(kube_node_status_capacity{resource="memory"})

If this query also returns a positive value, then our cluster has overcommitted the memory.

Memory overcommit
Memory overcommit

Query 4: Find Unhealthy Kubernetes Pods

Before running this query, create a Pod with the following specification:


apiVersion: v1
kind: Pod
metadata:
  name: dummy-pod2
spec:
  containers:
    - name: dummy-pod
      image: ubuntu
  restartPolicy: Always
  nodeSelector:
    disktype: ssd

This pod won’t be able to run because we don’t have a node that has the label <terminal inline>disktype: ssd<terminal inline>.

Now run the following query:


min_over_time(sum by (namespace, pod) (kube_pod_status_phase{phase=~"Pending|Unknown|Failed"})[15m:1m]) > 0

IUnhealthy Kubernetes Pod
Unhealthy Kubernetes Pod

Query 5: Find Kubernetes Pods CrashLooping

Before running the query, create a Pod with the following specification:


apiVersion: v1
kind: Pod
metadata:
  name: dummy-pod3
spec:
  containers:
    - name: dummy-pod3
      image: ubuntu
  restartPolicy: Always

Now run the following query:


increase(kube_pod_container_status_restarts_total[15m]) > 3

Crashlooping Kubernetes Pod
Crashlooping Kubernetes Pod

More on CrashLooping here.

Query 6: Find Number of Containers Without CPU Limits In Each Namespace


count by (namespace)(sum by (namespace,pod,container)(kube_pod_container_info{container!=""}) unless sum by (namespace,pod,container)(kube_pod_container_resource_limits{resource="cpu"}))

Number of containers without CPU limits in each namespace
Number of containers without CPU limits in each namespace

Query 7: Find PersistentVolumeClaim in Pending State

Before running the query, create a PersistentVolumeClaim with the following specification:


apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc-claim-2
spec:
  storageClassName: manual
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 3Gi

This will get stuck in <terminal inline>Pending<terminal inline> state as we don’t have a storageClass called “manual" in our cluster.

Now run the following query:


kube_persistentvolumeclaim_status_phase{phase="Pending"}

PersistentVolumeClaim in Pending state
PersistentVolumeClaim in Pending state

Query 8: Find Unstable Nodes

In this query, you will find nodes that are intermittently switching between “Ready" and “NotReady" status continuously.


sum(changes(kube_node_status_condition{status="true",condition="Ready"}[15m])) by (node) > 2

If both the nodes are running fine, you shouldn’t get any result for this query.

Query 9: Find Idle CPU Cores


sum((rate(container_cpu_usage_seconds_total{container!="POD",container!=""}[30m]) - on (namespace,pod,container) group_left avg by (namespace,pod,container)(kube_pod_container_resource_requests{resource="cpu"})) * -1 >0)

Idle CPU cores
Idle CPU cores

Query 10: Find Idle Memory


sum((container_memory_usage_bytes{container!="POD",container!=""} - on (namespace,pod,container) avg by (namespace,pod,container)(kube_pod_container_resource_requests{resource="memory"})) * -1 >0 ) / (1024*1024*1024)

Idle memory
Idle memory

Query 11: Find Node Status


sum(kube_node_status_condition{condition="Ready",status="true"})
sum(kube_node_status_condition{condition="NotReady",status="true"})
sum(kube_node_spec_unschedulable) by (node)

Conclusion

This article covered a lot of ground. You’ve learned about the main components of Prometheus, and its query language, PromQL. You saw how PromQL basic expressions can return important metrics, which can be further processed with operators and functions. You set up a Kubernetes cluster, installed Prometheus on it ,and ran some queries to check the cluster’s health.

These queries are a good starting point. The real power of Prometheus comes into the picture when you utilize the alert manager to send notifications when a certain metric breaches a threshold. And then there is Grafana, which comes with a lot of built-in dashboards for Kubernetes’ monitoring. I suggest you experiment more with the queries as you learn, and build a library of queries you can use for future projects.

Looking for an out-of-the-box monitoring solution?

With a simple one-line install, ContainIQ allows you to monitor the health of your cluster with pre-built dashboards and easy-to-set alerts.

Article by

Vinayak Pandey

Senior Systems Engineer

Vinayak is an experienced cloud consultant with a knack of automation, currently working with Cognizant Singapore. Before that, Vinayak worked as a Senior Systems Engineer at Singapore Airlines. He has a Bachelor of Technology in Computer Science & Engineering from SRMS.

Read More