Most organizations have a disaster recovery plan. Far fewer have tested it. Fewer still have set recovery targets that actually reflect the cost of downtime to their business. Here's how to close that gap.
What RTO and RPO actually mean
Recovery Time Objective (RTO) is how long your organization can tolerate being down after an incident. If a ransomware attack encrypts your file server at 9 AM, your RTO answers: by when must it be restored?
Recovery Point Objective (RPO) is how much data loss your organization can tolerate. If your last backup was at midnight and you're hit at 9 AM, your data loss is 9 hours. Your RPO answers: is that acceptable?
These aren't technical decisions — they're business decisions. The right RTO and RPO depend on how much an hour of downtime costs you, what your regulatory obligations are, and what you're willing to spend on prevention.
Why most DR plans fail
- Targets are set aspirationally, not based on actual recovery testing
- Backups exist but restoration has never been tested end-to-end
- Critical dependencies (AD, DNS, licences) are missing from the plan
- The plan is documented but no one outside IT has read it
- Cloud-hosted data is assumed to be backed up — it often isn't
A DR plan that has never been tested is not a DR plan. It's a document that will fail at the worst possible moment.
How to calculate your RTO and RPO
Start with your most critical systems — the ones that stop the business if they go down. For each one, answer these questions:
- What does it cost per hour if this system is unavailable? (Revenue lost, staff idle, penalties)
- What is the regulatory obligation? (PHIPA, SOC 2, contractual SLAs)
- What is the maximum acceptable data loss? (1 hour? 4 hours? 24 hours?)
- How long does a full restoration actually take in practice?
The last question is the one most organizations can't answer — because they've never measured it. That's the first thing to fix.
Tiering your systems
Not every system needs a 4-hour RTO. Tiering lets you allocate recovery resources where they matter most.
- Tier 1 (mission-critical): RTO 4 hours or less — ERP, patient records, payment systems, production line controls
- Tier 2 (business-important): RTO 24 hours — email, file shares, internal tools
- Tier 3 (deferrable): RTO 72 hours or more — archives, reporting systems, dev/test environments
Testing: the part most organizations skip
A DR test doesn't have to be a full failover exercise. Start with a tabletop test: walk your team through a ransomware scenario step by step. Who does what? Who has the authority to invoke DR? What's the communication plan? Where are the recovery keys and credentials?
Graduate to functional tests: restore a single system from backup in a sandboxed environment. Measure how long it takes. Compare to your stated RTO. You'll almost always find a gap.
- Tabletop test: annual minimum, after any significant infrastructure change
- Functional restore test: quarterly for Tier 1 systems
- Full failover simulation: annually for organizations with hard regulatory requirements
Getting started
If you don't have RTO/RPO targets documented, start there. Identify your top 5 most critical systems and calculate the business cost of each going down for 4, 24, and 72 hours. That conversation alone usually shifts how leadership thinks about DR investment.
Aegisys offers structured DR planning and testing services for Canadian organizations — including right-sized recovery architecture, documented runbooks, and regular test cycles. If you're not sure where your gaps are, a risk assessment is a good place to start.
From the Aegisys team
Questions about this topic? We're happy to talk through your specific situation.
No pitch, no pressure. A straightforward conversation about your environment and what matters most.
Get in touch