Still, we’d like to avoid these problems from happening in the future, and it turned out that resolving this was really, really easy. Yet, I didn’t find much documentation, tutorials, or how-tos on configuring S3 multi-region failover.
This tutorial will show you how to configure a URL which provides multi-region failover for S3 buckets, protecting against regional failures in S3. In this, we use Route 53. If you do not use Route 53 for your authoritative DNS servers, you may setup a ternary domain and utilize CNAMES to delegate specific records to Route 53. This tutorial does not cover the remediation of potential Route 53 failures.
The thing we need to do is have our S3 data replicated. Amazon offers a replication feature out of the box, but it only takes effect for new or updated objects. This replication comes with several downsides:
For many websites, a simple two-region replication will be sufficient, and the size of objects will be small enough that versioning will not be an issue. However, it would be relatively easy to use S3 Lambda trigger instead, allowing non-versioned copies to multiple regions, as long as the data can be copied within the 5 minute Lambda execution window.
At IOpipe, we choose to use the built-in replication.
S3 Bucket Replication Configuration:
Each S3 bucket is put behind Cloudfront. This provides TLS termination, caching, and other resiliency features. This is really optional, but we use it at IOpipe. If eliminating Cloudfront, you will simply need fewer health checks.
It is also possible to perform failover between one bucket fronted by Cloudfront, and another directly accessed, guarding against Cloudfront outages.
Route53 Health Checks
This is where things get interesting. We setup three health checks per region, one checking the health of the S3 bucket, another checking the health of the Cloudfront distribution, and a calculated health check requiring that BOTH of the previous checks be green. This latter health check is what we monitor from Route53.
We use simple “Basic” health checks of AWS endpoints. As configured, each health check costs $0.50/mo. With 6 checks, this is a total of $3/mo per multi-region bucket. (If not using Cloudfront, this would be $1/mo)
Health Check Configuration:
Route53 Routing Policies
For the DNS name pointing to these S3 buckets, we created a CNAME pointing to each Cloudfront distribution, then enabled a “Failover” Routing Policy. One Cloudfront distribution (and underlying bucket) became Primary, and the other Secondary.
For each CNAME, we enabled “Associate with Health Check”, specifying the appropriate calculated health check.
This configuration might, honestly, be overkill for the frequency at which S3 has serious outages — but it’s also not too difficult or costly to configure and maintain. The $3/mo extra we now pay to AWS has bought us some piece of mind.