APM
Definition
Continuous tracking and analysis of application performance metrics to identify bottlenecks, errors, and optimization opportunities.
Use Cases
- Netflix: Detecting and troubleshooting latency spikes and errors across microservices during high-traffic events — Instrumented services with distributed tracing and metrics, correlating request traces with service-level dashboards to pinpoint slow downstream dependencies and regressions after deployments (Faster root-cause analysis and reduced time to restore service by quickly isolating which service or dependency caused elevated latency or error rates)
- Uber: Monitoring end-to-end request performance across a large microservices architecture to identify bottlenecks — Used distributed tracing and service-level performance metrics to follow requests across services and highlight hotspots such as slow RPC calls or overloaded services (Improved reliability and performance by identifying high-latency paths and prioritizing fixes based on real production impact)
- Shopify: Keeping checkout and storefront performance stable during traffic surges and frequent releases — Tracked key web transactions and backend dependencies with APM-style telemetry (latency, error rates, and traces) to detect regressions and slow database queries (Earlier detection of performance regressions and quicker remediation of slow queries and code paths that impact conversion-critical pages)
Provider Equivalents
- AWS: Amazon CloudWatch Application Signals
- Azure: Azure Monitor Application Insights
- GCP: Cloud Monitoring (APM via Cloud Trace, Cloud Profiler, Error Reporting, and Cloud Logging)
- OCI: OCI Application Performance Monitoring
Frequently Asked Questions
- What's the difference between APM and observability?
- APM focuses specifically on application behavior—request latency, error rates, throughput, and dependency performance (like databases and external APIs). Observability is broader: it includes APM plus infrastructure monitoring, logging, tracing, alerting, and the ability to ask new questions about system behavior using telemetry (metrics, logs, and traces). In practice, APM is usually a key part of an observability platform.
- When should I use APM?
- Use APM when you need to understand why users experience slowness or errors, especially in production. It’s most valuable for APIs, microservices, and web apps where performance depends on multiple services and dependencies. Common triggers include: rising page/API latency, intermittent errors, frequent deployments, scaling to more users, or needing to meet SLOs/SLAs.
- How much does APM cost?
- APM pricing usually depends on how much telemetry you collect and retain. Common cost drivers are: number of hosts/containers, volume of traces and spans, metrics cardinality, log ingestion, sampling rate, and retention period. Managed cloud offerings may bundle APM into monitoring bills (for example, charges for ingested metrics/traces/logs and retention), while third-party tools often price by host, per GB ingested, or per million traces.
Category: monitoring
Difficulty: intermediate
Related Terms
See Also