There are a few things that are required for a reasonable blue-green strategy.
- No downtime between releases.
- Atomic deployment. A visitor sees a whole working version from before or after the deployment. This can be tricky because a user doesn't request a site atomically, i.e. first the HTML is requested, and then some time later all the static resources.
- E2E tests run on a certification environment that has the bare minimum of changes between it and production.
- API gateway has the concept of stages, where you deploy and API to a stage, and then, later, atomically update another stage to that deployment.
- API gateway can proxy requests to S3, so the entire HTML of the site can placed behind the API gateway.
- API gateway supports custom domains and the ability to map paths to specific stages.
With these features in mind, the overal architecture of the site can be as follows.
- You have all the HTML pages of the site defined as an API, with appropriate resources defined. A Swagger definition file is a reasonable choice for this.
- 2 stages setup for the API, each with a custom domain. The public domain of the site has an API mapping from the root path to the production stage; and another domain, which is used for certification, has an API mapping from the root path to the certification stage.
- A single S3 bucket for the assets, i.e. everything except the HTML. These must have versioned file names, such as having a fragment of the md5 hash of the contents. This S3 bucket can be behind a Cloudfront distribution if you would like.
- Two S3 buckets for the html. One for production and one for certification.
A strategy to deploy using this is below.
- Deploy static assets to the asset bucket, making sure that each is has a file name that is versioned with an MD5 of its contents, and existing assets are not deleted.
- Determine which bucket is in production. One choice is to actually embed this as a mock route in the Swagger definition file, and download this in the build script, so you have an up to date file.
- Upload all HTML to the bucket that isn't in production: the certification bucket.
- Modify the Swagger definition file so requests for HTML are proxied to the certification bucket. You could use your favourite template language for this.
- Deploy the API, as defined by the modified Swagger definition file, to the certification stage. Take note of the deploymentId.
- Run E2E tests.
- If they all pass, update the production stage so its deploymentId is set to the certification stage.
At every stage above, a visitor to the public site will fetch either the pre-release HTML, or the post-release HTML. In both cases, all the assets will load correctly.
Note: To keep this post brief, not part of this strategy is removal of assets that are no longer used.