Usage and cost optimization for Managed Service for Prometheus

更新时间:
复制 MD 格式

This guide explains how to analyze the metric write volume of Managed Service for Prometheus and identify high-cost sources, such as instances, jobs, and metrics. It also provides optimization and governance strategies to control your costs.

Billing for Managed Service for Prometheus is primarily based on the metric ingestion fee and the storage fee. For more information, see Billing overview. To optimize usage, you must first analyze the instances, jobs, and metrics with high write volumes. Managed Service for Prometheus provides features such as usage analysis, metric statistics, and metric governance to support this process.

Usage analysis

1. Analyze by instance

First, review all your Prometheus instances to identify those with high usage.

  1. Log in to the Cloud Monitor console.

  2. In the left-side navigation pane, click Prometheus Monitoring > Usage Statistics.

  3. In the Instance Usage Overview area, you can view billing-related statistics for each instance, such as reported data volume, written data volume, and archived storage volume.

    • Recommendation: Sort by Custom Metric Reported Volume (Millions) or Custom Metric Written Volume (GB) in descending order to identify the top N instances with the highest metric ingestion.

      The Instance Usage Overview panel displays usage data for each instance in a table. In addition to columns related to custom metrics, it also includes columns such as Instance ID, Instance Name, Region, Basic Metric Reported Volume (Millions), and Basic Metric Written Volume (GB). You can use this information to compare basic and custom metric usage across instances.

Note

Reported data volume and written data volume are two different billing methods for data ingestion into a Prometheus instance. You can choose one of these methods. Typically, a high reported data volume corresponds to a high written data volume. For more information, see Billing overview.

2. Analyze by job

After analyzing usage by instance, drill down into high-usage instances to identify which jobs generate the most data.

  1. On the Usage Analysis dashboard, filter for the instance you want to analyze.

  2. View the job rankings:

    • In the Metric Reported Volume Statistics > Top 10 Custom Jobs by Reported Volume to view the top 10 custom jobs by reported data volume.

    • In the Metric Written Volume Statistics > Top 10 Custom Jobs by Written Volume to view the top 10 jobs by written data volume.

    The Top 10 Custom Jobs by Written Volume panel displays information for the top 10 jobs by write volume in a table. The table includes the Job Name, Written Volume (Bytes), and Percentage columns.

3. Analyze by metric

To identify the specific metrics that cause high usage, analyze them within the relevant Prometheus instance.

  1. In the left-side navigation pane, select Instance List, and then click the name of the target instance to open its details page.

  2. In the left-side navigation pane, select Metric Management and then click the Metric Statistics tab:

    • Top 10 Metrics by Data Point Count: View the metrics with the highest reported data volume.

    • Top 10 Metrics by Data Volume: View the metrics with the highest written data volume.

  3. You can also filter the dashboard to view the high-volume jobs identified in the previous step.

image

Important

The data point count and data volume on the Metric Statistics tab are calculated in real time. There may be a discrepancy between these statistics and the actual reported or written volume on your bill. Your final bill takes precedence.

4. Analyze high-cardinality metrics

In Prometheus, high cardinality refers to an unusually large number of time series. This is typically caused by too many label combinations for a metric, often due to storing divergent values in a label.

The Metric Management > Metric Governance page provides statistics for time series and labels:

  • Metric Quick Analysis: View the top N metrics by the number of time series in the current instance.

  • Label Quick Analysis: View an analysis of the top N labels by the number of values. You can also filter by a specific metric.

The Metric Quick Analysis (15 min) panel lists the metric name, number of time series, total count, and a time series sample. In the example, the top N metrics include apiserver_request_duration_seconds_bucket (26,153/1,637,280), etcd_request_duration_seconds_bucket (19,435/1,139,040), and apiserver_request_total (1,345/79,560). The Label Quick Analysis (15 min) panel lists the label key, number of unique values, total count, and a label value sample. In the example, high-cardinality labels include name (402/188,202), resource (228/3,062,640), le (200/3,066,720), and instance (26/3,790,422).

Cost optimization strategies

Based on the insights from your usage analysis, use the following strategies to optimize costs.

Strategy 1: Choose a billing method

Managed Service for Prometheus offers two billing methods at the instance level: by written data volume and by reported data volume. Use the Usage Statistics page to determine which method is more cost-effective.

  • Procedure: In the Settings of your Prometheus instance, switch the billing method.

  • Limitation: You can switch the billing method for an instance only once.

In the Basic Information panel of the instance, click the edit icon next to Billing Method to switch.

Strategy 2: Increase the scrape interval

  1. On the Integration Management page, click the name of the environment that you want to configure.

  2. In Component Management, find the component to configure, and then click Settings in the Actions column.

  3. On the configuration page, set the Scraping Interval (seconds).

For example, set it to 15.

Strategy 3: Drop unused metrics

  1. In the left-side navigation pane, select Integration Management. In the Actions column for the target environment, click Metric Scraping.

  2. On the metric scraping page, click Discard Metrics and enter the metrics to drop.

Strategy 4: Optimize high-cardinality metrics

  • Avoid divergent labels: At the metric source, do not use labels with high cardinality, such as user IDs, order IDs, or trace IDs.

  • Normalize labels: If a label stores variable values, such as a URL path, normalize the label. For example, change /api/user/123/profile to /api/user/:id/profile.

Summary

The following table summarizes and compares these optimization strategies.

Optimization strategy

Scope

Limitations

Implementation difficulty

Choose the right billing method for your instance

Instance level

You can switch the billing method for an instance only once.

Low

Increase the scrape interval

Job level

Not supported for cloud service metrics, which typically have a 1-minute granularity.

Medium

Drop unused metrics

Metric level

Dropping advanced metrics for cloud services is not supported.

Medium

Optimize high-cardinality metrics

Metric level

Requires adjusting the scrape logic.

High