Prometheus, an open-source solution for metrics and alerting, was originally developed by SoundCloud in 2012. Today, the number of companies using Prometheus has grown dramatically. According to GitHub, Prometheus is used by more than 1,700 companies, and more than 700 individual contributors have contributed to the project.
Prometheus is a popular first step for companies looking to collect metrics and configure alerts, and it is commonly used by companies who are running workloads on Kubernetes. Prometheus is often self-hosted but can also be used as a managed offering through most of the major cloud providers, including Amazon Web Services (AWS) and Google Cloud Platform (GCP).
Prometheus has many advantages — such as its highly dimensional nature, its useful querying capabilities with PromQL, and the fact that it allows teams to configure highly customizable alerts with Alertmanager. Prometheus also has some visualization capabilities, and it’s often paired with Grafana for extended visualization.
However, Prometheus also has its limitations. And companies often look for alternatives as they start to scale or as they look to reduce the amount of engineering time required to maintain their monitoring toolset.
In this article, we’re exploring six popular alternatives to Prometheus: three open-source options and three commercial solutions.
- InfluxDB with Kapacitor
- Nagios Core
3 Open-Source Alternatives
Open-source tools have many advantages. And for engineering teams looking to stick with an open-source solution, these three open-source projects are worth considering.
InfluxDB with Kapacitor
Launched just a year after Prometheus, InfluxDB is a popular open-source time-series database. InfluxDB and Prometheus are often compared to each other due to a shared set of goals, but the two technologies differ in a number of ways and serve different use cases.
On GitHub, InfluxDB has garnered an incredible amount of support. As of this writing, there have been more than 400 contributors to the project and more than 35,000 commits.
InfluxDB was launched to be a high-performance time-series database. And to most closely compare InfluxDB to Prometheus, you should consider using Kapacitor with it. Kapacitor, another open-source project, enables teams to monitor and alert on the data collected in a time-series database. Using Kapacitor provides functionality similar to that of Prometheus’s Alertmanager.
Like Prometheus, InfluxDB has a large number of integrations available. You can also integrate Prometheus and InfluxDB!
For engineering teams who need event logging, InfluxDB is likely a better solution. InfluxDB uses a variant of a log-structured merge tree for storage with a write-ahead log, shared by time, so it is a better choice for event logging than Prometheus, with its append-only file-per-time-series approach
There is a commercial offering available for both InfluxDB and Kapacitor. The commercial offerings might make sense for teams that have redundancy and long-term storage, and for those who need to scale horizontally.
Nagios, originally launched in the 1990s, is a well-known platform and provider of tools for monitoring. While there is some overlap between Prometheus and Nagios, the two technologies are quite different in their approach and capabilities.
Nagios Core has a relatively old community, and contributions to the open-source project have slowed — whereas Prometheus has a very active and growing community behind it. Prometheus is largely a more robust and current solution for this generation of technology companies. Prometheus is clearly the superior solution in that it offers more integrations and better alerting capabilities, and it is easier to use.
However, there are some areas where Nagios might be a good alternative, especially for companies struggling to scale Prometheus to large systems.
Nagios is more focused on application network traffic and security, whereas Prometheus is more focused on the applications and the infrastructure itself. Perhaps the biggest benefit that Nagios offers is its ability to scale out of the box. However, if you’re using Prometheus alongside a long-term data store, like Thanos, the benefits will seem relatively small.
Nagios offers some basic visualizations out of the box. But the offering is lackluster when compared with using Prometheus alongside a dedicated visualization tool like Grafana, or with ContainIQ’s custom metrics offering.
Sensu was originally launched as an open-source set of monitoring tools in 2017. The company went on to launch a commercial offering, too, and was acquired by Sumo Logic in 2021. Today, Sensu offers both open-source and commercial tools for monitoring. Sensu’s open-source offering is often compared to Prometheus, and it has a number of advantages, as well as a few important differences, including:
- Sensu, like Prometheus, allows for time-series metrics; however, Sensu allows for status, tracing, and any other JSON-encapsulated data, too.
- Sensu uses an extensible data model which allows for more than just metrics-oriented monitoring.
- Sensu’s structured data approach supports high-cardinality observations, including key-value metadata and rich service health status information, as well as raw metrics.
- Sensu is flexible, in that it supports many sources of data including Prometheus, StatsD, and Nagios.
- For larger companies with complex rundecks and remediation processes, Sensu offers a number of important integrations like PagerDuty, ServiceNow, and Jira.
- Sensu offers tools for automated remediation and allows engineering teams to get more context when troubleshooting, by enriching Prometheus endpoint data alongside other metadata.
The Sensu agent can be deployed as a sidecar or as a DaemonSet. And users are able to use a number of different datastores including InfluxDB, Elastic, and more.
To get started with Sensu, checkout the GitHub repository.
3 Commercial Alternatives
While using an open-source toolset has many benefits, there are often efficiencies for teams using a paid or commercial offering. Here are three paid solutions that might be good alternatives for metrics and more.
Launched in 2021, ContainIQ is often compared to Prometheus as an out-of-the-box solution for Kubernetes metrics and more. ContainIQ provides a number of features including metrics, events, logs, and traces. And because ContainIQ is a Kubernetes-native solution, users are able to get started very quickly with little additional configuration required.
Like Prometheus, ContainIQ captures and stores infrastructure metrics for the pods and nodes running inside the cluster. Users are able to quickly view metrics like CPU and memory usage for given pods or nodes and are able to filter by cluster, namespace, and more. Users are able to see average usage and usage over time and toggle on limits to see how close or far historical usage is to the limit thresholds.
ContainIQ also makes it easy to view Kubernetes events for given pods and nodes. And using the Events dashboard, users can correlate events to logs at given points in time. With the Health dashboard, users can view all of their deployments, StatefulSets, and DaemonSets inside a cluster. And they are able to quickly see the health of each, the Git repo and commit currently deployed, and the number of replicas available vs. requested. Users can also see the CPU and memory for each service over time.
ContainIQ also offers custom metrics, and users are able to integrate Prometheus with ContainIQ to create custom dashboards. Custom metrics are currently in alpha and will be released to the general public in the summer of 2022.
Outside of metrics, ContainIQ offers a number of other helpful tools, including logging, tracing, and a dashboard for viewing latency over time.
Like Prometheus, ContainIQ has a built-in engine for alerting. Users are able to set alerts on metrics, events, log messages, and more. Users are able to send the alerts to Slack or to the destination of their choosing using a webhook.
And because ContainIQ is a commercial offering, users do not need to worry about monitoring their toolset. ContainIQ offers a self-service SaaS solution, and users can sign up directly on the website. ContainIQ offers a Power plan billed at $20 per node/host per month and $0.50 per GB of log ingest.
Introduced in 2010, Datadog is a popular and widely used tool. Datadog, like Prometheus, allows users to capture critical information including metrics. Datadog is a managed SaaS offering that collects core metrics for virtually any system or service.
With Datadog, users are able to monitor and visualize usage metrics across their entire environment. Datadog allows companies using Kubernetes to visualize their Kubernetes clusters with pod and node-level views. You can quickly toggle between pods and nodes to see usage at points in time, and view averages over longer periods of time.
Custom metrics are available, and 100 are included per host per month.
And like Prometheus Alertmanager, Datadog allows users to create alerts based on changes or disruptions to particular services. Datadog also has a number of intelligent alerting capabilities. Using machine learning, Datadog is able to automatically create alerts for customers using the Enterprise plan.
Datadog also gives users the ability to detect anomalies automatically and forecast metrics based on past performance. Datadog’s outlier detection is helpful when troubleshooting issues on the fly. And when using the correlations feature, users are able to correlate metrics to other helpful pieces of information.
In addition to metrics, Datadog offers other helpful tools like logging, security, and APM.
Datadog has a free tier that allows users to collect metrics for as many as five hosts, and to view the data for up to 24 hours. Datadog also offers a number of paid plans and features. For metrics, Datadog offers Pro and Enterprise plans:
- Pro: $15 per host per month when billed annually, or $18 on-demand, 15-month metric retention
- Enterprise: $23 per host per month when billed annually, or $27 on-demand, 15-month metric retention as a base, with customizable retention offered
To get started with Datadog, users are able to create an account on their website. You can also contact Datadog if you would like to see a product demo.
Levitate by Last9
Last9’s Levitate platform is a commercial time-series database that was built to offer performance at a scale. The company, which was founded in 2020, offers a unique feature set for teams dealing with the challenge of using Prometheus with very large systems.
As you may know, scaling Prometheus with large workloads introduces a number of challenges including slow queries, high costs, and an increasingly large amount of time that needs to be dedicated to monitoring Prometheus itself.
With Last9, users are able to tackle high cardinality, speed up query times, and reduce the costs of storing large volumes of metrics. Last9 allows users to easily create retention policies that help to de-prioritize unhelped and outdated metrics that would otherwise lead to slowdowns and ever-increasing storage costs.
The company is PromQL-compatible, which is good for teams who have experience with the popular query language. And users are able to configure alerts across the platform and take advantage of a number of unique alerting capabilities. For example, with Last9, users are able to see the changes that happened within a system leading up to a particular issue or alert. And the anomaly detection and forecasting tools can help engineering teams get out in front of potential issues before they cascade. This makes debugging more efficient and can help to reduce alert fatigue.
To get started with Last9, users must first book a demo via the company’s website. The company offers a free trial, but how long the trial will last is not specified. The company does not provide pricing on its website.
Prometheus is a widely adopted and mature solution for metrics. However, in some cases, using Prometheus might not make sense for your company.
There are a number of open-source alternatives like InfluxDB with Kapcitor, Nagios, and Sensu. These open-source alternatives are similar in terms of functionality but may offer additional benefits for companies with large-scale systems, or for those who need additional capabilities.
There are also a number of commercial offerings available. In this article, we highlighted ContainIQ, a Kubernetes-native solution to metrics, as well as logging and tracing. We also highlighted Datadog, a popular choice for larger companies, and Last9, which is a newer offering for the largest of companies who are having trouble scaling.
Finally, it’s important to remember that Prometheus can be used alongside and integrated with many of these open-source and commercial offerings.