Disaster Recovery

This disaster recovery plan defines the recovery objectives, backup strategy, restore procedures, and failover processes for FeatureSignals. It applies to FeatureSignals Cloud, Dedicated Cloud, and provides guidance for self-hosted deployments.

Warning

This is a living document. It is reviewed quarterly and updated after every significant architecture change or incident. Last reviewed: Q1 2026.

Recovery Objectives (RPO / RTO)

Recovery objectives define how much data loss is acceptable (RPO) and how quickly service must be restored (RTO):

Scenario	RPO	RTO	Target
Database corruption (single AZ)	< 5 minutes (WAL shipping)	< 30 minutes	FeatureSignals Cloud
Full region failure	< 1 hour (cross-region backup)	< 4 hours	FeatureSignals Cloud
Dedicated Cloud — instance failure	< 5 minutes (WAL shipping)	< 15 minutes (auto-failover)	Dedicated Cloud
Self-hosted — complete rebuild	Customer-defined backup schedule	Customer-driven	Self-Hosted

Backup Strategy

FeatureSignals employs a layered backup strategy to meet the RPO targets:

PostgreSQL WAL Archiving

Continuous Write-Ahead Log (WAL) archiving to cloud object storage (S3-compatible). Point-in-time recovery with 5-minute granularity. WAL segments are shipped every 60 seconds or when they reach 16 MB.

Daily Full Backups

Full pg_dump backups taken daily at 03:00 UTC during low-traffic window. Encrypted at rest with AES-256. Retained for 30 days. Stored in a separate region from the primary database.

Cross-Region Replication

Backups replicated to a secondary cloud region within 1 hour. For Dedicated Cloud, customers can configure an additional replication target in their own object storage account.

Immutable Backups

Backups stored with object lock (WORM — write once, read many) for 7 days. This protects against ransomware and accidental deletion. Compliance mode prevents even root accounts from deleting locked backups.

Restore Procedures

1. Database Restore from Backup

Provision a new PostgreSQL instance (same version as backup).
Download the latest daily backup from object storage.
Restore with pg_restore to the new instance.
Apply WAL segments forward to the desired point-in-time.
Update DNS or connection strings to point to the new instance.
Verify flag evaluations return expected results from a test SDK.

2. Full Stack Recovery

Provision new compute instances in the target region.
Restore PostgreSQL database (follow procedure above).
Deploy the latest FeatureSignals release via CI/CD or Helm chart.
Populate Redis cache by restarting the server (auto-warms from database).
Verify health endpoints: GET /health and GET /ready.
Run the integration test suite against the restored environment.
Switch DNS or load balancer traffic to the new stack.

Regional Failover

FeatureSignals Cloud uses active-passive regional failover for disaster recovery:

Primary region: All traffic served from the primary cloud region. Database is the source of truth.
Standby region: Infrastructure pre-provisioned (compute, database instance, object storage). Database restored from the latest cross-region backup. Not serving traffic in normal operation.
Failover trigger: Manual decision by the on-call engineer after confirming the primary region is unrecoverable within RTO. Failover is not automatic to prevent split-brain scenarios.
DNS cutover: Update DNS records to point to the standby region. TTL is set to 60 seconds to allow fast propagation.

Testing Disaster Recovery

DR procedures are only as good as their last test. We run the following DR tests on a regular cadence:

Test Type	Frequency	Scope
Backup verification	Weekly (automated)	Verify latest backup is restorable. Checksum validation.
Tabletop exercise	Monthly	Walk through DR procedures with the engineering team. No actual failover.
Database restore drill	Quarterly	Restore database from backup in an isolated environment. Run integration tests.
Full regional failover	Biannually	Complete failover to standby region. Serve production traffic for 24 hours. Fail back.

Self-Hosted DR Guidance

If you're running FeatureSignals self-hosted, you are responsible for your own DR plan. Here's what we recommend:

Automate PostgreSQL backups — Use pg_dumpor your cloud provider's managed backup service. Schedule daily full backups with WAL archiving for PITR.
Store backups off-site — Replicate backups to a different region, cloud provider, or on-premises location.
Document your restore procedure — Write down the exact steps. The person restoring at 3 AM may not be the person who set it up.
Test regularly— Restore from backup into a staging environment quarterly. A backup you haven't tested is not a backup.
Monitor backup health — Alert if backups fail, if WAL shipping lags, or if backup storage is approaching capacity.

Info

For Dedicated Cloud customers, DR is configured as part of onboarding. Your solutions engineer will work with you to define RPO/RTO targets, configure cross-region replication, and schedule the first DR test within 30 days of go-live.