Blue/Green deployment strategy is used for zero downtime deployments. What’s great about this method is that even if something goes wrong during deployment, it reduces the downtime and risk by switching blue/green server availability.
One of the challenges with automating deployment is the cut-over itself, taking software from the final stage of testing to live production. You usually need to do this quickly in order to minimize downtime. The blue/green deployment approach does this by ensuring you have two production environments, as identical as possible. At any time one of them, let’s say blue for the example, is live. As you prepare a new release of your software you do your final stage of testing in the green environment. Once the software is working in the green environment, you switch the router so that all incoming requests go to the green environment – the blue one is now idle. – Martin Fowler
To review some of load balancer concepts please visit this page.
How it works:
For our example, let’s consider a load balancer with four servers
Step 1 is to split the server pool into two pools, the blue one and the green one. All pools are available for the moment
Step 2 is to make the green pool unavailable and only the blue pool available. In order to make this it would be best to wait the green pool for connection draining.
Step 3 is to update the green pool with new software. The final stage of this step is testing the green pool after software update
Step 4 is to switch pools, the green pool becomes available in load balancer, the blue pool will become unavailable after connection draining.
At this step, you could wait some time to see the behaviour of the new software in production before continuing deployment. It’s always a good idea to look at logs and monitoring charts.
If any issues come up, causing the system to fail, you can do a Rollback by simply switching pools again, the green pool with new software becoming unavailable and the blue pool with old software becoming available. To complete the rollback, the green pool will be updated with old version of the software.
Step 5 If everything is ok with green pool, the new software will be installed on the blue pool as well.
Step 6. Reconnect blue pool to load balancer.
After this step, if any errors occur and the system fails, the rollback procedure is executed using also the Blue/Green procedure.
When should you use this method:
The Blue/Green deployment strategy works if the infrastructure can support all traffic with half the capacity during deployment.
If the infrastructure can not handle all traffic with only half of the servers, then this method can not be used.
- During deployment, all traffic is being handled by half of the capacity
- Before switching green servers with blue servers, in order to have zero downtime, the blue serves must handle any outstanding transactions/requests before they are removed from load balancer. This implies that for a short period, the blue and green servers will be available at the same time.
- Rollback is easy, all you have to do is to switch pools from green to blue when the blue ones have old software or create another blue/green deploy if all serves where updated.
- The green serves can be considered the staging so we can use them to perform security tests, load tests and integration tests before making them available in load balancer.
- Since the green servers are not getting any traffic, any service from them can be restarted without affecting the downtime
- Database system can be affected by the green servers updated to new version of software in which case the rollback procedure to the old software version may fail. To solve this problem you can create a single database for all servers (blue and green), change database structure and information for green servers but with backward compatibility so that the old software can work. When we are sure that the new system works as expected after deploy, we can remove all database support left for rollback.
- There may be times where the old software can not work with the affected database at all without changing a little bit the software. In that case, you can make a preliminary deploy with the old software changed to work in case of rollback, after this the main deploy can start because all systems now have backward compatibility
- This issue explained for database is a more general one, with any kind of common resource used by new and old version of software, the solution can look the same. Make software changes with rollback in mind and create as many preliminary deploy steps as you need to your system to work in case of rollback
- If any resource used as shared between blue and green servers becomes unavailable (like database migration, restart services) you can create a maintenance page and configure the system to switch all incoming requests from blue servers to maintenance page during deploy
- A common issue is handling the cache keys which holds the information prepared by the new or the old servers. If the same key holds as value different structure of information and is affected by both types of instances the system will fail because an instance knows how to process only one type of structure. To solve this, a solution is to version the cache key, this way the new servers will not enter in conflict with old servers.