Table of Contents

High Cardinality Explained
Why High Cardinality Matters in Observability
Cardinality vs. Dimensionality
How High Cardinality Happens
The Impact of High Cardinality on Observability Systems
Example: How Cardinality Multiplies
How to Reduce High Cardinality
Metrics vs. Logs vs. Traces for High-Cardinality Data
Best Practices for Managing High Cardinality
Why High Cardinality Is a Governance Problem
FAQs

What Is High Cardinality in Observability?

3 min. read

Table of Contents

High Cardinality Explained
Why High Cardinality Matters in Observability
Cardinality vs. Dimensionality
How High Cardinality Happens
The Impact of High Cardinality on Observability Systems
Example: How Cardinality Multiplies
How to Reduce High Cardinality
Metrics vs. Logs vs. Traces for High-Cardinality Data
Best Practices for Managing High Cardinality
Why High Cardinality Is a Governance Problem
FAQs

High cardinality in observability refers to telemetry data with a very large number of unique label values or label combinations. It usually happens when metrics include dynamic attributes such as user IDs, request IDs, container names, or temporary IP addresses. As those unique values multiply, observability systems become more expensive to run, slower to query, and harder to manage.

Key Points

High cardinality means too many unique metric combinations: It happens when metrics use labels with many changing or unbounded values.
Cloud native systems make it worse: Kubernetes, containers, microservices, and autoscaling rapidly increase the number of time series.
Granularity comes with a cost: More detailed telemetry can improve visibility, but it also increases ingestion, query, and storage overhead.
Governance matters as much as tooling: Teams need standards for labels, telemetry routing, and cardinality control.

Diagram showing how dynamic labels such as user ID, pod name, and IP address create millions of unique time series in observability systems, leading to high costs, slow queries, and complex management.

High Cardinality Explained

Observability systems depend on labels and tags to organize telemetry data and make it searchable. Those labels help teams filter metrics by dimensions such as service, host, endpoint, or region. But when engineers add dynamic or effectively unlimited values to those labels, the number of unique time series grows rapidly. That growth is what defines high cardinality.

This becomes especially problematic in modern distributed systems. Security teams, SREs, and platform engineers all need granular telemetry to troubleshoot issues, trace requests, and investigate suspicious behavior. But when that granularity relies on labels such as transaction IDs, ephemeral pod names, or temporary IP addresses, the observability backend can become overloaded with millions of unique series.

In other words, high cardinality is not just a data problem. It is a performance, cost, and operational resilience problem. In a cloud native environment, poor label design can quietly degrade the very platform teams depend on for visibility.

Why High Cardinality Matters in Observability

High cardinality directly affects how well an observability platform performs under pressure. When a system must index and query massive numbers of unique time series, dashboards slow down, alerting becomes less reliable, and incident response suffers. Observability only works when teams can trust the data to be fast, complete, and available when something breaks.

This is particularly relevant in environments built around observability, AIOps, Kubernetes, and distributed infrastructure, where telemetry volumes already grow quickly. Modern platforms need enough detail to support troubleshooting and automation, but they also need enough discipline to avoid drowning in their own data.

Cardinality vs. Dimensionality

Side-by-side comparison explaining dimensionality versus cardinality: dimensionality shows a metric with three labels, while cardinality shows many unique values such as user ID, session ID, and request ID creating thousands of unique combinations.

Cardinality and dimensionality are closely related, but they are not the same thing. Dimensionality refers to the number of labels attached to a metric. Cardinality refers to the number of unique value combinations those labels produce.

A metric might include labels such as status_code, host, and endpoint. That is dimensionality. If a developer adds a label such as user_id, the dimensionality increases by only one, but the cardinality can explode because millions of unique users may now be represented in the dataset.

Concept	Definition	Example	Operational Impact
Dimensionality	The number of labels attached to a metric	service, endpoint, region	More ways to segment and analyze data
Cardinality	The number of unique value combinations generated by those labels	Adding request_id or user_id	More time series, more cost, more strain on the platform

How High Cardinality Happens

High cardinality usually starts with good intentions. Teams want more context, more precision, and faster troubleshooting. So they add more labels to their metrics. The trouble begins when those labels contain values that change constantly or have no meaningful limit.

Dynamic Identifiers in Metrics

The most common cause of high cardinality is the use of dynamic identifiers as metric labels. Examples include session IDs, request IDs, transaction hashes, unique user tokens, and one-time resource names. Each new value creates a new series, even if the underlying metric is otherwise identical.

These values may be useful for investigation, but they are usually better suited to logs or traces than metrics. Metrics are designed for aggregated numerical trends. They are not a great home for infinite variation. That is where things go sideways.

Cloud Native and Kubernetes Environments

Cloud native systems make high cardinality much more likely. Kubernetes frequently creates and destroys pods, assigns new names and IP addresses, and scales services dynamically. Microservice architectures also multiply the number of source-destination relationships teams may want to track. Every bit of that dynamism can create more unique telemetry combinations.

That is one reason cardinality problems often show up in teams working with Kubernetes and other modern cloud-native environments. These architectures are flexible by design, but they also generate a lot of short-lived identifiers that observability systems must handle carefully.

Misconfigured Integrations and Over-Collection

Third-party integrations can also introduce cardinality problems. Cloud services, security tools, and open-source collectors may emit highly granular metrics by default. If teams forward all of that raw telemetry into a central platform without filtering or normalization, cardinality rises fast. Misconfigured scrapers and aggressive defaults are common culprits.

The Impact of High Cardinality on Observability Systems

Unmanaged cardinality affects observability platforms in several ways, and none of them are fun.

Infographic illustrating the impact of high cardinality on observability platforms, including rising storage costs, slower queries, ingestion bottlenecks, and monitoring blind spots.

Higher Storage Costs

Every unique time series takes up storage space and index capacity. As cardinality grows, platforms need more RAM, more disk, and more compute to keep up. In cloud environments, that translates directly into higher bills for telemetry ingestion, storage, and processing. A sudden spike can turn a monitoring budget into a small bonfire.

Slower Query Performance

High cardinality forces query engines to search through far more indexes and series to return results. Dashboards become slower, filters become heavier, and alert queries may time out. When security and operations teams depend on real-time visibility, delayed dashboards are not just annoying. They are operationally dangerous.

Ingestion Bottlenecks

When backends cannot keep up with the volume of new series they must index, backpressure develops in the ingestion pipeline. Agents queue data, buffers fill up, and eventually telemetry gets dropped. That means teams lose visibility right when the system is under stress and visibility matters most.

Monitoring Blind Spots

Dropped metrics, delayed queries, and incomplete aggregations create blind spots during investigations. Teams may miss performance regressions, fail to detect anomalies, or struggle to reconstruct incident timelines.

In security use cases, incomplete telemetry can slow response and weaken confidence in the data. This is especially important for use cases tied to endpoint security, where telemetry quality affects detection and investigation depth.

Example: How Cardinality Multiplies

High cardinality often grows through multiplication rather than through any one obviously bad decision. A metric in a Kubernetes environment might already include labels for node, service, and endpoint. That may be manageable. But once a dynamic identifier such as request_id is added, the number of unique combinations can increase by orders of magnitude.

That is why cardinality issues often seem to appear suddenly. One extra label does not look dangerous in code review, but it can create millions of new series in production.

How to Reduce High Cardinality

The goal is not to eliminate detail, but rather to store the right detail in the right place.

Drop Unnecessary Labels at the Edge

One of the fastest ways to reduce cardinality is to strip noisy or irrelevant labels before they reach the backend. Edge collectors and telemetry pipelines can remove forbidden dimensions such as user IDs, request hashes, or temporary infrastructure identifiers before they create new time series.

Normalize Labels

Normalization reduces unnecessary variation. For example, a path such as /user/123 can be normalized to /user/{id} so that the metric reflects a stable route pattern instead of thousands of unique user-specific URLs. That simple change can dramatically reduce database load.

Aggregate and Roll Up Metrics

Aggregation and roll-up rules help preserve useful trends without storing every raw variation. Instead of tracking telemetry at the most granular possible level forever, teams can summarize metrics by service, cluster, region, or endpoint and store the aggregate for long-term analysis.

Use Logs and Traces for High-Variance Data

Metrics are best for trend analysis, alerting, and dashboards. Logs and traces are better for high-variance context such as request IDs, session values, and deep troubleshooting details. This is one of the most important design decisions in an observability strategy. Use metrics for the signal, and use logs or traces for the story behind it.

Choose Platforms Built for Telemetry Scale

Traditional relational systems are not designed for high-ingestion observability workloads. Purpose-built time-series and observability platforms are better equipped to handle dimensional data, compression, and query distribution. Even so, no platform magically fixes bad label hygiene. Tooling helps, but governance does the heavy lifting.

Metrics vs. Logs vs. Traces for High-Cardinality Data

Comparison table showing when to use metrics, logs, and traces in observability: metrics for dashboards and alerts, logs for event details, and traces for request flow, highlighting that logs and traces handle high-variance data better than metrics.

A successful observability strategy relies on selecting the appropriate telemetry type for each task. The distinction is important because teams often overload metrics with excessive detail.

When this happens, managing dashboards and alerts becomes more difficult; a better approach is to store dynamic context in logs or traces and use links for investigation. This practice is also directly connected to AIOps, as effective automation requires rigorous telemetry discipline.

Telemetry Type	Best For	Handles High Variance Well?	Example Use
Metrics	Dashboards, trends, alerting	No	Error rate by service or CPU by cluster
Logs	Detailed event records	Yes	User activity, request details, audit events
Traces	End-to-end request flow	Yes	Following a transaction across microservices

Best Practices for Managing High Cardinality

Teams can reduce cardinality risk by following a few practical rules:

Avoid dynamic identifiers in metric labels.
Use bounded values wherever possible.
Normalize paths and resource names before ingestion.
Aggregate telemetry when raw granularity is not needed for alerting.
Route high-variance context into logs and traces instead of metrics.
Audit third-party integrations and collector defaults.
Set governance standards for instrumentation across engineering teams.

Why High Cardinality Is a Governance Problem

High cardinality is often treated as an observability platform issue, but the root cause usually starts much earlier. It starts in instrumentation choices, naming conventions, ownership boundaries, and whether teams agree on what belongs in a metric at all. That is why governance matters.

Without shared standards, every team can add just one “helpful” label until the whole stack starts wheezing. Organizations that manage cardinality well do not simply buy bigger infrastructure; they create rules for telemetry design, review instrumentation changes, and build pipelines that enforce those standards before data reaches storage.

FAQs

A common example is a metric that includes a label such as user_id, session_id, or request_id. Because those values change constantly, they can create massive numbers of unique time series.

No. High-cardinality data can be valuable for troubleshooting, security investigations, and deep visibility. The problem is not the existence of detailed data. The problem is storing that detail in metrics systems that are designed for aggregation, not endless variation.

Kubernetes environments are highly dynamic. Pods are created and destroyed frequently, IP addresses change, services scale automatically, and labels can multiply quickly across distributed systems. That constant churn creates more unique telemetry combinations.

It increases the number of unique time series a platform must ingest, store, and query. That can lead to slower dashboards, delayed alerts, ingestion bottlenecks, and higher infrastructure costs.

Yes. Your source notes that collectors can filter, mutate, aggregate, and drop high-variance labels before the data reaches backend storage. That makes OpenTelemetry-style collection pipelines useful for managing cardinality growth.