Resources

Kubernetes Monitoring: Enhancing Cluster Efficiency

August 2, 2021

In this post, we explore Kubernetes monitoring, including its best practices, metrics, and tools that allow administrators to monitor component health within a cluster.

Sudip Sengupta
Solutions Architect

The Importance of Monitoring Kubernetes Clusters

Kubernetes offers a seamless, distributed environment of services to manage application workloads hosted across cluster nodes. While doing so, Kubernetes orchestrates the encapsulation of these applications and services in deployable PODs (a set of one or more containers). These PODs are controlled and managed by agents in the <terminal inline>kube-controller-manager<terminal inline> that help keep cluster components in the desired state.  

Since a Kubernetes ecosystem comprises multiple machines (nodes) working together within the same subsystem (cluster), making these observable remain a key aspect of maintaining operational efficiency. Through this post, we delve into Kubernetes monitoring, including its best practices, metrics, and tools that allow administrators to monitor component health within a cluster.

The Challenge of Monitoring in Distributed, Containerized Systems 

Containers help package responsive, highly-available applications as executable units that allow OS-level abstraction. By design, this enables a distributed framework of containerized applications and services that intricates monitoring. Kubernetes clusters also make use of plenty of components and processes that make it equally challenging to keep track of application events. This makes traditional machine monitoring mechanisms ineffective in checking performance parameters of a dynamically transient services-oriented architecture.

The Kubernetes dashboard offers a platform where users can access information on resource utilization. This dashboard, however, does not include some of the refined mechanisms needed to monitor ephemeral application workloads. This is why various mechanisms have been developed to improve monitoring and event logging in Kubernetes clusters.

What Metrics Are Monitored in Kubernetes

Monitoring strategies closely track several metric types based on the components being observed. These are:

  • POD/Container Metrics: These metrics help administrators check for resources allocated to every POD of the cluster. This enables them to see whether containers are over or under-provisioned. Some common container metrics include CPU usage, memory usage, disk usage, and node-network traffic bandwidth. While common POD metrics include Deployment Progress, POD health, Network Data, and the Number of running instances.
  • Node Metrics: Every node running in a Kubernetes cluster has definite storage, memory, and CPU capacity to be used by PODs and cluster resources. Observing these metrics include node-level consumption of storage and network bandwidth.
  • Cluster Metrics: These encompass data on the performance of a cluster’s control plane components. The performance data of master node components like the API Server, Scheduler, Controllers, and ETCD Server help to ensure cluster components run in the desired state. 
  • Application Metrics: These are developed within an application’s code, and are written in tune with the requirements of the business case. A database application, for instance, can provide performance information on the attributes and tables it stores. Such metrics are usually exposed to Kubernetes monitoring tools through an Application Programming Interface (API).

Kubernetes Monitoring Solutions  

Kubernetes offers two main approaches for collecting data on running clusters: DaemonSets and the Metrics Server

  • A DaemonSet is a Kubernetes object that ensures a number of nodes in the cluster run a copy of a POD. These PODs are used to monitor nodes and collect logs on POD events. This data can then be passed to advanced monitoring solutions for further analysis. 
  • The Metrics Server is a POD that collects resource consumption data from the Kubelet service on each node and exposes this data to the <terminal inline>apiserver<terminal inline> using the Metrics API. This data can then be used to autoscale an application and adjust resources to be used by containers.

Popular Tools for Monitoring Kubernetes Clusters

Prometheus and Grafana

Developed by SoundCloud, Prometheus is an open-source toolkit that supports the instrumentation, collection, and storage of Kubernetes metrics. Prometheus collects and stores cluster performance data, and then exposes this data through a Prometheus Web User Interface to let administrators access, present, and chart the information collected. As an extended monitoring layer, Grafana offers a visualization tool that accesses the Prometheus server using Prometheus Query Language (PromQL) for a more comprehensive analysis of performance data. However, if you are looking for more comprehensive solutions, both Datadog and Sysdig are popular alternatives to Prometheus, see here and here for comparisons respectively.

cAdvisor

The Container Advisor (cAdvisor) is an open-source metrics collection agent specifically built for containers. This solution runs at a node level since it comes integrated with the kubelet service as one of the binaries. cAdvisor gathers data on CPU usage, memory usage, network status, and storage for every live container, helping administrators gain informative insight into machine-level performance metrics.

Kubernetes Dashboard

Through a web-based user interface, Kubernetes helps administrators manage cluster resources and troubleshoot containerized applications. The dashboard displays an overview of running applications within the cluster. Administrators can also access information on the state of cluster resources, including errors and other events that may affect application performance.

MetricFire

This platform offers a comprehensive open-source monitoring solution to help administrators observe infrastructure, application, and system performance in real-time. MetricFire currently leverages managed services of Prometheus and Graphite for metrics collection, while using Grafana as the monitoring dashboard.

Best Practices for Monitoring and Observability in Kubernetes

To make the most of a Kubernetes monitoring solution, it is important to follow practices that provide valuable insights. Some of these include:

  1. Enforce deep visibility to make sure every cluster component is observable.
  2. Pick an appropriate instrumentation strategy using libraries and sidecar agents so that all critical metrics are collected without fail.
  3. Include event logs and historical performance data so that team-members can easily trace root causes of performance problems
  4. Constantly monitor Kubernetes Control Plane Elements to validate the performance of cluster services
  5. Choose a SaaS monitoring solution that can be deployed as code for easier maintenance and management. 

While Kubernetes makes it easier to run distributed, containerized workloads, this also introduces the complexity of handling distributed and connected compute elements. In such complex setups, monitoring all critical components ensure there are no single points of failure within a cluster. 

More so, administering Kubernetes requires a focused approach, dedicated tools and the right methodologies for proactive monitoring of cluster components. By enabling an efficient monitoring mechanism, organizations can not only increase cluster efficiency but also reduce operational costs while proactively managing the resources at their optimum standards.  

Article by

Sudip Sengupta

Solutions Architect

Sudip Sengupta is a TOGAF Certified Solutions Architect with more than 15 years of experience working for global majors such as CSC, Hewlett Packard Enterprise, and DXC Technology. Sudip now works as a full-time tech writer, focusing on Cloud, DevOps, SaaS, and Cybersecurity. When not writing or reading, he’s likely on the squash court or playing Chess.

Read More