Resources

How to Export, Monitor, and Alert on Kubernetes Events

October 11, 2021

Kubernetes events provide insight into the performance of the clusters and are also particularly useful for debugging. We highlight the Kubernetes events you should be alerting on and we show you how to do it.

Nate Matherson
Co-founder & CEO

Many engineering teams find it useful to create alerts on Kubernetes events like CrashLoopBackoffs, Pod Evictions, and Kubernetes jobs either succeeding or failing. These events and others can provide a deep level of insight into the performance of the cluster and are also particularly useful for debugging.

However, by default, the cloud providers like GKE, EKS, and AKS, do not save Kubernetes events for long periods of time. Engineering teams will likely find it useful to store this information for longer periods of time and in the process create a historical record of the changes that have happened inside of the cluster.

At ContainIQ, we collect and store historical Kubernetes events for you. And users have the ability to create alerts on specific events and then feed those notifications to selected Slack channels.

Logging and Storing Kubernetes Events

As we mentioned above, Kubernetes events are typically stored in the cluster for less than an hour by default. In order to create a historical record of events you must capture them from the Kubernetes API and then export them into a database. There are an number of open-source exporters available, including one from OpsGenie, or you could use ContainIQ. At ContainIQ, we collect and store Kubernetes events for you and offer visualization and debugging tools on top.

Kubernetes events dashboard

Above is an image of our Kubernetes events dashboard where users can search for specific events, view all events over time, filter based on a date range or frequency, and toggle between normal or warning events.

A user, for example, can search for only warning events from the prior week and view that day on an hourly basis. As another example, a user could drill down on a specific node over the last month and see all of that data on a daily basis.

Creating Alerts on Kubernetes Events

After you’ve collected and stored the Kubernetes events from the API, the next step is to set alerts on these events. There are certain events that really every engineering team should be alerting on, things like CrashLoopBackoffs and Pod Evections which are leading indications. But it is also common to alert on things like Kubernetes jobs failing, pods restarting, ImagePullBackOffs, FailedAttachVolume, FailedMount, FailedScheduling, NodeNotReady, HostPortConflict, and more.

If you are using ContainIQ is is very easy to create alerts on specific Kubernetes events. By clicking on New Monitor, users will be prompted to create an alert on an event. The first step is to set the name for your monitor, or alert:

Creating a Kubernetes event

From there, users can choose the Event Reason and the Event Object Name:

Inputing an event reason

By clicking the Create Event Monitor button, the event alert will be created and notifications will start feeding to the connected Slack channel (instructions below). Once created, alerts on events can be toggled on or off or can be deleted from the Monitors tab on the ContainIQ dashboard.

Feeding Kubernetes Events to Slack

There are a handful of open-source tools you can use to feed Kubernetes event alerts to a Slack channel. Alternatively, ContainIQ is a managed tool that can be used to to feed Kubernetes events to a desired Slack channel.

Linking a Slack channel should only take a few minutes and users can change or update their Slack settings from My Account, Integrations.

Slack integration with kubernetes

By clicking Activate, users can choose the Slack channel where they want to receive the alerts.

Conclusion

Alerting on Kubernetes events can be helpful for most engineering teams. Engineers can save time today and over the longer term by creating a historical log of Kubernetes events alongside smart alerting triggers. There are a number of open-source tools that you can patch together and will allow you do both. But you also could consider using ContainIQ to collect and store your events and to create alerts on these events for you.

ContainIQ also provides monitoring for popular metrics such as CPU and memory usage for pods and nodes, latency, and other helpful tools for rightsizing your Kubernetes cluster.

Article by

Nate Matherson

Co-founder & CEO

Nate Matherson is the Co-founder & CEO of ContainIQ.

Read More