High Availability and Disaster Recovery in the Cloud

There’s a difference between high availability and disaster recovery. The basic idea of high availability is that the infrastructure never goes down. If there’s a hardware failure, you seamlessly transition over to functioning hardware. Or if you’re doing a deployment, you can deploy into a new environment and switch over the traffic, so that as far as customers are concerned, you’re always up.

A lot of people with dedicated hardware on premises think in terms of disaster recovery, and what would happen to a primary data center. A business needs to be able to shift all load over to a secondary data center that’s geographically isolated.

“Say a backhoe cuts through the power cables at one data center. And because you’ve designed your architecture for a failover to support high availability, it’s now taken up transparently by other data centers,” said Keith Gregory, the AWS technical practice lead at Chariot Solutions. “Disaster recovery is still critically important within the Amazon world. Now, the simple case of all of the data centers in a region going down is not the thing you should worry about. That’s a case where somebody has nuked Washington and taken out Ashburn, Virginia. What’s more important though, is when you depend on specific AWS services. Several years ago, S3 had an outage. And all of the services in that region depend on S3. So you need to be able to shift over and switch to a different region where you have basically a mirror, and be prepared to recover to that. Now, that can be very expensive. In fact, one company that I worked with when that S3 outage happened, it caught us all by surprise, because S3 isn’t supposed to ever go down.

We looked at what it would take to fully replicate to a different region and came to the decision that it is a lot easier to have a very basic app server running somewhere outside of Amazon, that just says the site is in maintenance mode. When S3 went down, it took down a significant part of the internet. Now if you’re at the scale of Netflix, you can’t even accept that, but for most companies, you have to make a cost benefit analysis of true disaster recovery.”

Forgotten Password for AWS?

Consider that disaster recovery might be a case where you lose control of your AWS credentials. If you’re deploying on Amazon, the root credentials have complete control over the account, and you should never use them.

Keith explained, “When you sign up for your Amazon account, make sure that you use an email address that goes to multiple people. You have a very complex password for your root credentials, and use multi factor authentication. Print all of that out and put it in your safe deposit box and create users. The problem is if somebody ever gets access to those root credentials, they have complete control of your account. And there’s really nothing you can do. They can switch the email address. They can do whatever damage they want. And Amazon doesn’t actually know you. So Amazon can’t say, Oh, yeah, this shouldn’t have happened. So it is critically important to keep control of that. And critically important to be prepared if something happens, even a set of developer credentials that might have too much power could be used to destroy all of your database systems and all of the backups.” Lock down your credentials and have fallback positions. It is at least as important in a cloud deployment, if not more so, than in a traditional data center.

Risk in the Cloud vs. Data Centers

The first thing that gets people into trouble with cloud deployment is security. Security in the cloud is very different from security in a data center. In a data center, you have physical security of the machines, so that you can’t accidentally shut down a machine, but there could still be a power failure, and everything goes down.

Security in a data center is also really limited, quite frankly. So you have physical security, you have your passwords, but you don’t have the kind of granular control over security that you might have in a cloud deployment. Take a recent example of a well known banking company that had a breach, where their data storage was exposed to the outside world in a data center.

Another big and unexpected challenge is network latency. If you’re running on a local area network, you have response times in the single digit milliseconds. If you’re doing a hybrid cloud approach, where some applications live in a data center, some of them live on the cloud, you’re actually looking at potentially tens of milliseconds for every interaction. “I recently worked with a company that was doing a hybrid cloud strategy,” said Keith. “Their application for every single user interaction made several dozen requests to their database. When they first tried moving that database to the cloud, which is normally a good starting point, things that were taking 20 milliseconds to execute were suddenly taking close to a second, and the entire website slowed down to a crawl. So you have to think about how your components in your architecture communicate with each other. And once you’ve moved everything to the cloud, you have fairly fast communication times.”

Check how close web servers are to clients. It takes a certain amount of time for network packets to go across the United States. If you’re running in an East Coast data center, that slows down everything from the west coast, Europe or Australia. One of the benefits of the cloud over a traditional data center is the ability to move applications closer to clients.

Hidden Costs of the Cloud

However, one thing with the Amazon cloud is that people get hit with data transfer costs. Most people thinking of deploying to the cloud are very worried about instances that are going to cost 10 cents an hour, that’s $2.50 a day, and $700 a year. They feel they have to be careful that they’re not running too many of those, and focus entirely on right sizing. “Meanwhile, Amazon will charge you nine cents per gigabyte for all data that comes out of those EC2 instances and goes to actual clients, “explained Keith.

“Now, if your web app has a lot of large images, or worse, videos, that adds up really quickly. And even for communications within the Amazon network, you’re charged approximately two cents per gigabyte to cross availability zones. So if you’re making a lot of database calls, moving data back and forth very frequently, that’s going to run up a bill. I don’t think that most people take all these pieces into account. When they think of the cloud, they think, you know, here’s something that somebody else will maintain. But now there are a lot of pieces that require expertise in how to set it up and maintain it.”

Moving to the cloud is a process. It can be a very short process if you’re a startup with a great idea and want to get your minimum viable product out as soon as possible. Or it can be a longer process if you’re a medium sized company that is looking to migrate your existing applications to the cloud, it’s a process that never ends.

Once you’re running in the cloud, there’s always room for improvement. Amazon provides you enormous numbers of tools to make your cloud deployments work better. It’s a process of getting to the cloud, but it’s also a process once you’re in the cloud, improving your cloud deployments and constantly thinking of what can we do better. “That, for me, is one of the great things about the cloud.” You’re not locked into a specific configuration. You can change. You can experiment,” said Keith.

Read the first part of this blog here: How to Make the Most of an AWS Deployment. Starting an AWS project? Let Chariot be your guide. Use our contact form to start a conversation at any time.