Many modern distributed applications are based on loosely coupled, microservice-based architecture, and their components are spread across multinode clusters. Because of this, failed critical services can potentially slow down or crash the entire system. Operations teams need monitoring solutions to watch over various applications and infrastructure to alert users when something goes wrong.
Prometheus is a widely popular, open-source monitoring tool that does just that. Thanks to the large community helping to maintain it, this tool has matured over time and can capture real-time metrics from multiple source systems. Prometheus saves these metrics in a time series database and alerts users when metric values cross preconfigured thresholds. Even dynamic environments, like Kubernetes, can benefit from such features.
This article will go through the pros and cons of using a managed Prometheus service so you can decide whether one would be appropriate for your business.
Benefits of a Managed Prometheus Service
As organizations move toward rapid digital transformation, more vendors offer managed services to make that journey easier. In the managed services model, software as a service (SaaS) vendors provide an application that’s hosted on the vendor’s cloud platform along with the client data. Organizations pay the vendor a fixed or variable price each month; in return, the vendor ensures the application is always available, and the client doesn’t have to worry about purchasing hardware, maintaining infrastructure, or installing and patching software. Many applications are now offered as managed services, and most public cloud vendors provide them.
It takes a certain amount of investment, time, and effort to implement and maintain Prometheus for large IT footprints. There are the costs of hosting the infrastructure, maintaining each integration point, troubleshooting issues, and paying for engineering skills and time. Just the prospect of off-loading these responsibilities can make a managed Prometheus service seem worthwhile.
But there are other benefits as well. First, a managed service saves you the effort of scaling open-source Prometheus for monitoring hundreds or even thousands of endpoints. Depending on how large your IT landscape is, monitoring the clusters needed to scrape all the metrics can be a complex task. Configuration and scaling tasks become the vendor’s responsibility.
Second, due to the elastic nature of the cloud—particularly for container-based applications—the volume of data can significantly increase with extra load (and as the load goes down, the data volume will go down). The Prometheus backend must be resilient enough to handle such fluctuations and ideally, will automatically scale, perhaps based on past behavior. A managed service provider can handle this as well.
Third, it can be extra work to gather insights from the metrics. You can run PromQL queries to do this, perhaps using Grafana with Prometheus to create customized charts and dashboards. But often, engineers and operations teams don’t have a consistent set of visualizations to depend on. With a managed service, you can still run those queries and build your dashboards, but you’ll get common chart types with your package. This can save time for your teams.
Finally, anyone who has configured monitoring for Kubernetes clusters can verify it’s no easy task to create integration between Prometheus and Nodes, Pods, and application containers. The ephemeral nature of Pods boosts the difficulty even more. With a managed service, the vendor will offer APIs so you can easily ingest metrics—simplifying the whole process.
Disadvantages of a Managed Prometheus Service
There may be cases when a self-hosted Prometheus setup is easier and more cost-effective.
For example, managed Prometheus service providers will offer a standard set of features and service-level agreements (SLA), which may not be suitable for your business needs.
Another example is porting your existing Prometheus setup to the vendor’s platform. You may have dozens or hundreds of integrations and standardized dashboards already working across the enterprise. Migrating such a large setup to a vendor platform could be too much in terms of time, effort, and cost unless there’s a good business case.
If your business has a small- to medium-sized IT infrastructure and application footprint, your operations team may be perfectly capable of maintaining a self-hosted cluster.
With a hosted solution, your organization’s metrics data is stored in a vendor’s IT infrastructure, which might be in a different geographical location. Depending on your business, there could be industry regulations that prohibit this practice.
Another potential issue is if you have a hybrid cloud setup and want a single pane of glass for all environments. With both private and perhaps multiple cloud tenancies, you might find it difficult to configure networking in each environment so every system can talk to the central Prometheus cluster.
As your organization grows, so will your IT systems. Monitoring everything in your IT fleet from a managed platform may come with a cost that outweighs the benefits.
What Are the Costs? AWS Example
To begin, we’ll examine how much you might spend with a managed Prometheus solution. There are different hosting solutions with different pricing structures; as an example, we’ll use Amazon Managed Service for Prometheus from Amazon Web Services (AWS).
The AWS documentation offers a pricing example to give you some idea. In that example, you’re monitoring a 10-node EC2 cluster running AWS Elastic Kubernetes Service (EKS) and you want Prometheus to collect around 1,000 metrics per node every 30 seconds for a whole month of 31 days.
With Managed Service for Prometheus, you won’t pay any upfront costs for a cluster setup, but you will pay for the capacity and queries you use. At the end of each month, your bill will include the charges for the metrics ingested, stored, and queried. AWS currently charges the following for the US-East (Ohio) region:
Here’s a cost breakdown for Amazon’s use case:
In this case, the total monthly cost for monitoring your Kubernetes clusters is $84.10.
You can use the AWS pricing calculator to check the total cost for your specific workload in any AWS region.
For a self-managed Prometheus setup, you have two options:
- Running it in the cloud
- Running it on-premises
For cloud deployments, you are looking at costs for the VM nodes, data transfer out (if your target systems are in different regions), storage, load balancers, and other related components. Once again, you can use the cloud service provider’s calculator to get an estimate.
The on-premises deployment will involve capital expenditures on components, like server hardware, storage, networking, rack space, power, and backup. Even if you use existing hardware and storage in your data center, you have to apportion their value to your cluster.
For both cloud or on-premises setups, you will also need to factor in the infrastructure and networking professionals’ hourly rate, as well as the rates of engineers who will install, configure, and manage the Prometheus monitoring system.
Compare Managed Prometheus Services
The managed Prometheus service market is still in its early stages, though there are established names in the game. AWS offers Amazon Managed Service for Prometheus. Google offers the Google Cloud Managed Service for Prometheus in private preview mode. Other vendors include New Relic, Sysdig, and MetricFire.
Deciding If It Is Worth It
Considering the potentially large upfront and operational cost of self-managed Prometheus setups, the managed offering looks promising by comparison. A managed Prometheus service handles day-to-day operational aspects for you—which is definitely a plus.
On the other hand, if you already have a robust on-premises Prometheus system and expert resources to manage it, you can keep using your on-premises setup. In addition, you might consider augmenting it with new Prometheus systems in the cloud if you have a cloud tenancy.
A managed service for Prometheus offers many benefits if you’re looking to reduce cost and operational burdens. But you should take into account the current size of your infrastructure and compare the managed service with the cost of a self-managed option. You should also consider the different features offered by each vendor.
If you’re looking for a way to monitor your Kubernetes clusters, you can also consider ContainIQ. The platform captures metrics and helps you monitor your clusters’ health from a set of out-of-the-box dashboards. ContainIQ also integrates with popular communication and alerting systems like Slack, which helps your ops team act more proactively.