Disaster Recovery Planning: Choosing Realistic RTO and RPO Targets

Most organizations have a disaster recovery plan. Far fewer have tested it. Fewer still have set recovery targets that actually reflect the cost of downtime to their business. Here's how to close that gap.

What RTO and RPO actually mean

Recovery Time Objective (RTO) is how long your organization can tolerate being down after an incident. If a ransomware attack encrypts your file server at 9 AM, your RTO answers: by when must it be restored?

Recovery Point Objective (RPO) is how much data loss your organization can tolerate. If your last backup was at midnight and you're hit at 9 AM, your data loss is 9 hours. Your RPO answers: is that acceptable?

These aren't technical decisions — they're business decisions. The right RTO and RPO depend on how much an hour of downtime costs you, what your regulatory obligations are, and what you're willing to spend on prevention.

Why most DR plans fail

Targets are set aspirationally, not based on actual recovery testing
Backups exist but restoration has never been tested end-to-end
Critical dependencies (AD, DNS, licences) are missing from the plan
The plan is documented but no one outside IT has read it
Cloud-hosted data is assumed to be backed up — it often isn't

A DR plan that has never been tested is not a DR plan. It's a document that will fail at the worst possible moment.

How to calculate your RTO and RPO

Start with your most critical systems — the ones that stop the business if they go down. For each one, answer these questions:

What does it cost per hour if this system is unavailable? (Revenue lost, staff idle, penalties)
What is the regulatory obligation? (PHIPA, SOC 2, contractual SLAs)
What is the maximum acceptable data loss? (1 hour? 4 hours? 24 hours?)
How long does a full restoration actually take in practice?

The last question is the one most organizations can't answer — because they've never measured it. That's the first thing to fix.

Tiering your systems

Not every system needs a 4-hour RTO. Tiering lets you allocate recovery resources where they matter most.

Tier 1 (mission-critical): RTO 4 hours or less — ERP, patient records, payment systems, production line controls
Tier 2 (business-important): RTO 24 hours — email, file shares, internal tools
Tier 3 (deferrable): RTO 72 hours or more — archives, reporting systems, dev/test environments

Testing: the part most organizations skip

A DR test doesn't have to be a full failover exercise. Start with a tabletop test: walk your team through a ransomware scenario step by step. Who does what? Who has the authority to invoke DR? What's the communication plan? Where are the recovery keys and credentials?

Graduate to functional tests: restore a single system from backup in a sandboxed environment. Measure how long it takes. Compare to your stated RTO. You'll almost always find a gap.

Tabletop test: annual minimum, after any significant infrastructure change
Functional restore test: quarterly for Tier 1 systems
Full failover simulation: annually for organizations with hard regulatory requirements

Getting started

If you don't have RTO/RPO targets documented, start there. Identify your top 5 most critical systems and calculate the business cost of each going down for 4, 24, and 72 hours. That conversation alone usually shifts how leadership thinks about DR investment.

Aegisys offers structured DR planning and testing services for Canadian organizations — including right-sized recovery architecture, documented runbooks, and regular test cycles. If you're not sure where your gaps are, a risk assessment is a good place to start.

From the Aegisys team

Questions about this topic? We're happy to talk through your specific situation.

No pitch, no pressure. A straightforward conversation about your environment and what matters most.

Get in touch