Question 1

What's the difference between MTTR and MTTD?

Accepted Answer

MTTD (Mean Time To Detect) measures how long it takes to notice an incident. MTTR measures how long it takes to restore service after the incident is detected. Lowering MTTD helps you start fixing sooner; lowering MTTR helps you finish fixing sooner.

Question 2

When should I track MTTR?

Accepted Answer

Track MTTR when you run production systems where downtime matters (customer-facing apps, APIs, data pipelines, internal business systems). It’s especially useful if you have on-call/incident response, SLOs/SLAs, or frequent changes (deployments) and want to quantify how quickly you recover from failures.

Question 3

How much does MTTR cost?

Accepted Answer

MTTR itself has no direct cost because it’s a metric. Costs come from the tools and practices used to measure and reduce it: monitoring and logging platforms, incident management/on-call tooling, additional redundancy (multi-AZ/multi-region), automated remediation (runbooks, functions, pipelines), and engineering time for reliability work.

MTTR

Definition

Use Cases

Frequently Asked Questions

See Also