Latency is a critical metric to monitor for many reasons. Importantly, a spike in latency for a given microservice or URL path means that your end-user is experiencing a disruption or slowdown in service. Depending on the application or industry you are in, even a brief spike in latency may impact client expectations and revenues for your business.
Tracking changes or spikes in latency across many microservices can be relatively difficult. Installing application packages, or middleware, or even editing the Docker files directly can take quite some time. And as new microservices are added or removed, engineering teams are required to spend time reinstrumenting.
If you are using Kubernetes and ContainIQ, however, it is actually quite a bit easier and in this article, we will explore how to measure and alert on HTTP latencies for all of your microservices and URL paths.
Using a Daemonset and eBPF to Instrument Across Every Node
At ContainIQ, we provide our users latency across all of their microservices and URL paths, as long as the kernel version is supportive. We are able to deliver this experience immediately by instrumenting from the kernel directly using eBPF.
ContainIQ runs as a Daemonset and it will automatically install the necessary Linux headers onto all nodes in the Kubernetes cluster. And going forward ContainIQ will automatically install them on any newly created nodes. From there, the Daemonset builds and installs the necessary packages to run the eBPF based program. As new services are added or removed, ContainIQ will automatically instrument them.
By using eBPF, we are able to time the requests from the socket directly and thus provide average, p95, and p99 latencies, as well as requests per second.
Latency By Microservices
After installing ContainIQ with ‘kubctl apply’ or HELM, the latency dashboard will begin updating automatically. Users are presented with a dashboard that shows a visualization of latency across all microservices. Users are given the option to filter by date range, frequency, and the particular service.
The table below the graph will populate with all microservices automatically. Users can use the search bar to filter the table or search for a given microservice. In the table, users can view requests per second, average, p95, and p99 latency for each microservice.
And, users have the ability to set alerts on spikes or meaningful changes to latency for a particular microservice. We will explore his below.
Latency By URL Paths or API Endpoint
Measuring and tracking latency by individual URL path or API endpoint can be particularly important. A slow down for a particularly valuable path may be directly impacting revenues, a checkout experience for an e-commerce site for example, as well as end-user experience.
By clicking on a given microservice in the table, users will be shown latency for each individual URL path or API endpoint.
In the table above, users can see a list of each endpoint with the calculated requests per second, average latency, p95 latency, and p99 latency. Users can also use the toggle in the top right-hand corner to switch data between the past hour, past day, and the past week.
Users can paginate through the endpoints manually but it is also possible to search for a particular endpoint quickly using the search bar.
Alerting on Changes in HTTP Latency
Creating alerts on spikes or changes in latencies takes only a few minutes. By clicking “New Monitor” at the top of ContainIQ users can create alerts on latency for each individual microservice.
After the alert has been created the notifications will feed to the Slack channel linked with ContainIQ or to another destination using a webhook. Linking a Slack channel should only take a few minutes and users can change or update their Slack settings from My Account, Integrations.
Users can toggle alerts on and off, or delete them altogether from the Monitors tab.
Latency is a mission-critical metric for many organizations. Luckily, if you are using Kubernetes alongside ContainIQ, measuring and alerting on latencies is a relatively painless process.
ContainIQ also provides Kubernetes native monitoring for popular metrics such as CPU and memory usage for pods and nodes, Kubernetes event tracking, and other helpful tools for rightsizing your Kubernetes cluster.