Over the last year I’ve had the opportunity to gain hands on experience building software to support multiple clouds. While most cloud consumers can live within the gated community of a single cloud, a few of us have the need to venture into the multi-cloud world. Below are the top 5 challenges my engineering team and I have encountered in going multi-cloud.

#1 - Feature Parity

While the convergence toward feature parity has made supporting multiple clouds substantially easier, deviations between the clouds continue to exist. As a reference, I define a public cloud as consisting of heterogeneous compute with variable instance types, attached storage with medium durability, support for the creation and publishing of custom images, global data centers with multiple availability zones per region, APIs, and a highly durable object store.

The first step in managing the lack of feature parity is to determine the minimum features required to support your applications. Some gaps can be worked around using off the shelf software running on top of IaaS compute (e.g. replacing Amazon RDS with a self-managed MySql cluster;  ElastiCache with clustered memcache servers). Other gaps simply cannot be worked around without substantially changing the underlying software architecture. For example, Microsoft does not offer attached storage, under the assertion you do not need to attach storage to compute in cloud applications (to which I say in my best Dr. Phil voice: “And how’s that working for you?”). IBM currently lacks multiple AZs per region, Rackspace does not yet have global data centers; and so on.

#2 Instance Types

Cloud providers offer different instance types in their compute services. Amazon offers the most flexible range of instances, with families focused on different types of workloads such as compute or memory. IBM is a close second, with a broad range of both 32 and 64-bit options. While Rackspace offers a much smaller list of instances, they cover most of the common configurations required by cloud consumers.

Understanding which instance types are right for your software requires that you gather data on the performance characteristics of your application. For example, some functional clusters may require fixed resources, e.g. 4 virtual cores and 16GB of memory. Other functional clusters will be driven by ratios that maximize price and/or performance, e.g. one virtual core to one terabyte of attached storage.

In the best case, you will find instance types with the right amount of memory, local disk, and virtual cores. But in some cases, you will simply have to compromise. To give you a quick sense of the variability, here is a quick glimpse across four public clouds:

  • Compute – The smallest instance type available on Amazon, Rackspace and Microsoft has a shared core, and has 2 cores for IBM. The largest contains 16 cores for IBM and Amazon, and 4 for both Rackspace and Microsoft.
  • Memory – The smallest instance type available has 2GB for IBM, 1.7GB for Amazon, 1GB for Rackspace, and 0.75GB for Microsoft. The largest contains 68GB for Amazon, 30GB for both IBM and Rackspace, and 8GB for Microsoft.
  • Local disk – The smallest instance type available has 60GB for IBM, 10GB for Rackspace, 0.02GB for Microsoft, and 0 for Amazon. The largest has 3TB for Amazon, 2TB for IBM, 1TB for Rackspace and 0.0005TB for Microsoft.

While there is a high variability in the available configurations and the ratios from the different cloud providers, you can typically work within the constraints to find a reasonable instance type (albeit sometimes not as cost effective as you might desire).

#3 - Performance

Unfortunately there still exists a wide variability in performance across clouds. For example, provisioning an instance on Amazon will yield less performance than an equivalent node on Rackspace. Rackspace seems to offer the best performance from its instances, due to what appears to be a combination of CPU bursting and higher quality disks.

I recommend gathering metrics to understand the operating constraints of different workloads within your application. You can then use standard benchmark tools to quantify the performance variability of the different clouds. Keep in mind when gathering data to allow a sufficiently large sample to compensate for the variability that occurs in a shared environment. Some differences can be managed by adopting different instance type; e.g. choosing an instance type with more virtual cores and/or higher memory to compensate for less performance from the equivalent instance of another provider.

One area of substantial difference is in block storage, due to very different underlying hardware infrastructures. I am hoping to see this change as we see new block storage services come to market from HP and Rackspace.

#4 - Security

Most clouds offer some basic security that includes locked down images, VLANs and in some cases VPCs. Unfortunately for Amazon customers though, there is no cloud provider that offers a comparable level of security to AWS.  In addition to being the only FISMA certified cloud, Amazon offers features such as security groups, Identity and Access Management (IAM), Multi-Factor Authentication (MFA), Virtual Private Cloud (VPC), and more.

Few other clouds can match the out of the box security of Amazon, so expect some effort to achieve the security requirements for your application.

#5 - Pricing

While all clouds sell their services through consumption-based pricing models, the incentives and the mechanisms for discounts are very different. IBM and Amazon are the only providers offering reserved pricing, and only Amazon has support today for the auction-based pricing for compute (a.k.a. spot instances). In addition, each cloud has different areas that offer higher value than others. For example, Rackspace offers some of the best price to value for lower performance instances, Microsoft for hosted SQL Server, IBM for 32-bit instances, and Amazon for its highly durable object store. The key in moving between clouds is to first understand the opportunities presented in the new price list, and drive decisions that meet the needs of your application while staying within your cost constraints.

Conclusions

The effort to migrate to a new cloud will be proportional to the complexity of your application and its coupling to a current cloud. While supporting multiple clouds in 2012 is not necessarily easy, it can be managed with deliberate planning and execution. Take the time to understand the needs of your application, and the characteristics of your current and future cloud. Expect some challenges in moving from one cloud to another. Over time, standardization in features, performance and price will make portability between cloud easier.


Related Posts: The Heterogeneous Cloud