Resources

Measuring and Alerting on Latency by Microservice and URL Path

September 27, 2021

Latency is a mission-critical metric for most organizations. Tracking latencies by microservice and endpoint can improve end-user experience and business performance.

Nate Matherson
Co-founder & CEO

Latency is a critical metric to monitor for many reasons. Importantly, a spike in latency for a given microservice or URL path means that your end-user is experiencing a disruption or slowdown in service. Depending on the application or industry you are in, even a brief spike in latency may impact client expectations and revenues for your business.

Tracking changes or spikes in latency across many microservices can be relatively difficult. Installing application packages, or middleware, or even editing the Docker files directly can take quite some time. And as new microservices are added or removed, engineering teams are required to spend time reinstrumenting.

If you are using Kubernetes and ContainIQ, however, it is actually quite a bit easier and in this article, we will explore how to measure and alert on HTTP latencies for all of your microservices and URL paths.

Using a Daemonset and eBPF to Instrument Across Every Node

At ContainIQ, we provide our users latency across all of their microservices and URL paths, as long as the kernel version is supportive. We are able to deliver this experience immediately by instrumenting from the kernel directly using eBPF.

ContainIQ runs as a Daemonset and it will automatically install the necessary Linux headers onto all nodes in the Kubernetes cluster. And going forward ContainIQ will automatically install them on any newly created nodes. From there, the Daemonset builds and installs the necessary packages to run the eBPF based program. As new services are added or removed, ContainIQ will automatically instrument them.

By using eBPF, we are able to time the requests from the socket directly and thus provide average, p95, and p99 latencies, as well as requests per second.

Latency By Microservices

After installing ContainIQ with ‘kubctl apply’ or HELM, the latency dashboard will begin updating automatically. Users are presented with a dashboard that shows a visualization of latency across all microservices. Users are given the option to filter by date range, frequency, and the particular service.

Latency by microservice, data over time and in table

The table below the graph will populate with all microservices automatically. Users can use the search bar to filter the table or search for a given microservice. In the table, users can view requests per second, average, p95, and p99 latency for each microservice.

And, users have the ability to set alerts on spikes or meaningful changes to latency for a particular microservice. We will explore his below.

Latency By URL Paths or API Endpoint

Measuring and tracking latency by individual URL path or API endpoint can be particularly important. A slow down for a particularly valuable path may be directly impacting revenues, a checkout experience for an e-commerce site for example, as well as end-user experience.

By clicking on a given microservice in the table, users will be shown latency for each individual URL path or API endpoint.

Measuring and tracking latency by URL path

In the table above, users can see a list of each endpoint with the calculated requests per second, average latency, p95 latency, and p99 latency. Users can also use the toggle in the top right-hand corner to switch data between the past hour, past day, and the past week.

Users can paginate through the endpoints manually but it is also possible to search for a particular endpoint quickly using the search bar.

Latency by URL path works on a limited number of kernel versions, including what is available on GKE. However, this feature may not work across all cloud providers depending on the kernel version.

Alerting on Changes in HTTP Latency

Creating alerts on spikes or changes in latencies takes only a few minutes. By clicking “New Monitor” at the top of ContainIQ users can create alerts on latency for each individual microservice.

Creating alerts on latency changes or spikes

After the alert has been created the notifications will feed to the Slack channel linked with ContainIQ. Linking a Slack channel should only take a few minutes and users can change or update their Slack settings from My Account, Integrations.

Users can toggle alerts on and off, or delete them altogether from the Monitors tab.

Conclusion

Latency is a mission-critical metric for many organizations. Luckily, if you are using Kubernetes alongside ContainIQ, measuring and alerting on latencies is a relatively painless process.

ContainIQ also provides monitoring for popular metrics such as CPU and memory usage for pods and nodes, Kubernetes event tracking, and other helpful tools for rightsizing your Kubernetes cluster.

Article by

Nate Matherson

Co-founder & CEO

Nate Matherson is the Co-founder & CEO of ContainIQ.

Read More