Summary first (for those of you who don’t feel like reading through my musings): When deploying a new version of our web application, we stage it on a new set of EC2 instances first, leaving he old production intact, then switch the elastic IP and (well tested) staging becomes production. No downtime involved.
I find that the most enjoyable “eureka moments” happen when you didn’t think of a situation as a problem and weren’t even looking for a solution. You just do something in a new way without giving it too much thought, only then to realize that you stumbled into something great. That kind of thing happened to us a short while after we moved our web application to EC2.
The problem was deploying a new version of our web app. We run short development cycles, so that process takes place every 2-4 weeks. Occasionally it runs smoothly – that is to say, most of the time it doesn’t, so there’s downtime involved (not to mention stress).
The best practice is to have a staging environment where you set up the new version first and test it before you load it to the production environment. Ideally, the staging environment mimics the production environment one-to-one. Of course, that seldom happens.
In the bad old days, when you either had your own data center, or (as in our case) paid a monthly fee for hosting, a perfect staging environment would mean paying twice. That just doesn’t feel like cash well spent in a start-up. So our stating environment was a mini-set up in our office, that didn’t mimic production all too well. As you can expect, new version deployment involved lots of down time, blood, sweat and tears.
The first time we set up our version on EC2, a different mechanism came very naturally:
1. The “old” production system is running. Don’t touch it.
2. Reserve all the EC2 instances you need to set up a new system from scratch. They become the staging environment for the new version.
3. Set up the new version on those new instances.
4. Test. Fix. Repeat.
5. Test some more (truth be told, we never actually do that, and neither will ou, but we all really should :( )
6. Switch the elastic IP address to the new web server instance.
7. The well-tested staging environment now becomes production. No down time. No unexpected problems.
8. Leave the old system up for a while – just in case.
You paid for a replica of the system for just a few hours, or maybe a couple of days: 15 – 100 times less expensive than keeping all the hardware for a staging environment.
No stress – you have all the time you need – the world can keep on using the old version while you’re setting up the new one.
Only set-up once – as the staging environment actually becomes the production after you switch the elastic IP, there’s no need to repeat the process again on the production.
If you missed something, the old production system is still available for a while, so you can easily switch back.
In short, it’s even better even than what I’d call a “perfect” staging environment in the past. And it all happened without even planning – we only realized what we’ve done after the fact. Bliss.