Monitoring & Observability Glossary

Application performance monitoring, logging, metrics, alerting, and observability tools for cloud applications.

20 Terms
All Skill Levels
Real Examples
2
Basic Terms
13
Intermediate Terms
5
Advanced Terms

20 Terms

CloudWatch

intermediate

AWS monitoring service that collects and tracks metrics, logs, and events from your applications and infrastructure. Like a security camera system that watches everything and alerts you when something unusual happens.

Example: CloudWatch monitors CPU usage on EC2 instances and automatically sends alerts when usage exceeds 80%...

Prometheus

advanced

Open-source monitoring and alerting toolkit for collecting and querying metrics. Like having a health monitoring system for your applications that tracks everything and alerts you to problems.

Example: Operations teams use Prometheus to monitor application performance and get alerts when response time...

Grafana

intermediate

Open-source platform for monitoring and observability with customizable dashboards. Like having a mission control center with screens showing the health of all your systems.

Example: Teams use Grafana dashboards to visualize server performance, application metrics, and business KPIs...

CloudWatch Logs

intermediate

AWS service for monitoring, storing, and accessing log files from EC2 instances and other AWS services. Like having a centralized filing system for all system messages and errors.

Example: DevOps teams use CloudWatch Logs to troubleshoot application issues by searching through millions of...

Azure Monitor

intermediate

Azure service for collecting and analyzing telemetry data from cloud and on-premises environments. Like having a comprehensive health monitoring system for all your technology infrastructure.

Example: IT teams use Azure Monitor to track application performance and get alerts when servers experience i...

Log Analytics

advanced

Azure service for collecting and analyzing log data from various sources. Like having a detective that can analyze clues from different crime scenes to solve complex cases.

Example: Security teams use Log Analytics to investigate suspicious activities across their entire IT infrast...

Application Insights

intermediate

Azure application performance management service for monitoring live applications. Like having a personal trainer for your applications that tracks their health and performance.

Example: Developers use Application Insights to identify slow database queries and optimize their web applica...

Cloud Monitoring

intermediate

Google's service for monitoring performance and health of cloud applications. Like having a dashboard that shows the vital signs of all your applications in real-time.

Example: E-commerce sites use Cloud Monitoring to track website performance and get alerts when response time...

Cloud Logging

intermediate

Google Cloud's centralized logging service that collects, stores, and analyzes logs from all your applications and infrastructure. Like having a detailed diary that automatically records everything happening across your cloud environment.

Example: When a Cloud Function fails, Cloud Logging captures the error details, stack traces, and timing info...

Application Performance Monitoring

intermediate

Continuous tracking and analysis of application performance metrics to identify bottlenecks, errors, and optimization opportunities.

Example: Application Performance Monitoring shows that your checkout page takes 5 seconds to load and pinpoin...

Centralized Logging

intermediate

Collecting and storing log data from multiple applications and systems in one central location for easier searching, monitoring, and analysis.

Example: Centralized Logging gathers error messages from all your microservices into one dashboard, making it...

Observability

advanced

Ability to understand what's happening inside a system by examining its outputs like logs, metrics, and traces. Like having security cameras, temperature sensors, and activity logs to understand everything happening in a building.

Example: Good observability lets you see not just that your website is slow, but exactly which database query...

APM

intermediate

Application Performance Monitoring - tools that track how well your application performs and where bottlenecks occur. Like having a fitness tracker for your software that shows which parts are working hard and which are slowing down.

Example: APM tools like New Relic show you that your checkout page takes 3 seconds to load because of a slow ...

Logging

basic

Recording events and activities that happen in your application for debugging and monitoring. Like keeping a detailed diary of everything your application does so you can review it later.

Example: Application logs show every user login, error message, and database query, helping developers diagno...

Golden Signals

intermediate

Four key metrics for monitoring distributed systems: latency, traffic, errors, and saturation. Like the vital signs doctors check to assess patient health.

Example: SRE teams monitor the golden signals to quickly detect when a service is degraded - high latency or ...

Distributed Tracing

advanced

A technique for tracking a single request as it travels through many different services in a cloud application. Like following a package through the postal system — you can see every stop it made, how long it spent at each location, and where any delays occurred.

Example: An online retailer uses AWS X-Ray to trace a customer's checkout request as it passes through 12 dif...

Managed Grafana

intermediate

A fully managed version of Grafana, the popular open-source visualization and dashboarding tool, that lets you create monitoring dashboards without managing the Grafana infrastructure yourself. Like subscribing to a professional monitoring display service instead of building your own monitoring wall.

Example: A DevOps team uses Amazon Managed Grafana to create real-time dashboards showing their application's...

Managed Prometheus

advanced

A fully managed version of Prometheus, the popular open-source metrics collection and alerting system, that stores and queries infrastructure metrics without you managing the Prometheus servers. Like outsourcing your building's entire alarm and sensor system to a professional monitoring company.

Example: A platform team uses Amazon Managed Service for Prometheus to collect metrics from their 500-node Ku...

Alerting

basic

Automatically notifying the right people when a system metric crosses a threshold or something unexpected happens. Like a smoke alarm for your cloud infrastructure.

Example: A CloudWatch alarm sends an email and triggers auto-scaling when CPU usage exceeds 80% for five cons...

Synthetic Monitoring

intermediate

Proactively testing your application by simulating user interactions on a schedule — checking that pages load, APIs respond, and transactions complete — before real users notice a problem. Like a robot secret shopper that tests your website every few minutes.

Example: CloudWatch Synthetics runs a canary script every 5 minutes that logs in, searches for a product, and...