Thursday, March 27, 2008

Storage Jobs @ Startups

Recently, Nathan Kaiser at nPost contacted me regarding his new widget displaying Startup Jobs on blogs. As sidebar on my blog is already too long, I decided to include his widget in a blog post. Try it out and let me know your feedback (positive and negative).

P.S. If you are using a RSS reader like Google Reader and don't see the widget, please visit my blog. While I am writing this post, I am not sure if widget will show up in the blog post either. In case it doesn't, please visit nPost Startups Jobs site to check out the startup jobs. Use keyword "storage" to find storage jobs at startups.

Sunday, March 23, 2008

Is number of objects true indicator of Amazon S3 growth?

In my last blog post, I estimated the data stored on Amazon S3 in exabyte range using 18 billion objects stored reported by Amazon CTO, Werner Vogels in his blog post.

In retrospect, it was an over-estimation by several order of magnitude (my bad) that was promptly corrected by MikeDoug using another data point AWS revenue. MikeDoug estimated (comment excerpts below) the data stored to be in 20PB (petabyte) range, way short of my estimates and may be more closer to reality.

No, doubt, it is still a significantly large number for a service that is only few years old. But, S3 growing up fast may not be as obvious from growth in stored objects as Vogels would like us to believe.
A recent report puts ALL of AWS at the 50 to 70 million in revenue for the year.

Let us pretend that, of the 70 million, 40 million in revenue was attributed to S3 alone for last year. That would be $3,333,333 a month for S3. This converts to 22,222,222 gigabytes, or 0.02 exabytes.
Other interesting tidbits if S3 has 20PB of stored data, 18 billion objects and 330,000 registered developers:

On average, each object is only storing about a megabyte of data. This number seems quite low so either deleted objects are being included in the published number of objects or developers are keeping object size low to prevent transfer timeouts.

On average, each developer is only storing 54GB of data. Considering some services like SmugMug are storing terabytes of data on S3, most probably there are lot of registered developers either not using S3 actively for storing data or have services under development.

Wednesday, March 19, 2008

How much data is in Amazon S3?

Today, Werner Vogels mentioned in his blog post Happy Birthday, Amazon S3! about the second birthday of Amazon S3 and also shared that by Jan 2008, S3 is storing 14 billion objects. I am not sure why Werner and others at Amazon are so cagey about sharing actual storage capacity used in AWS. In the past, I also have met with either silence or "trade secret" or "competitive advantage" response to my inquiries.

In my opinion, it only creates room for speculation as I am going to do with this post. So, how much data is stored on S3?

My initial guesstimate for stored data volume is between 14 and 70EB (Yes, EB is Exabyte) based on the published information about the size of individual object being one to five GB. Doesn't it seem very high? At first, it did to me. I have been trying to come up with alternate methods to estimate stored data volume like the typical size and type of data being stored by various services that are using S3. Even with an average value of 100MB per object, the stored data volume comes out to be 1.4 Exabyte, still a huge number for such a young service.

What is your estimate? Any suggestions on estimation method to arrive at more accurate number for data volume stored on S3.

Considering that S3 may be hosting Exabyte or more of data with in two years of existence, no wonder all established vendors EMC, IBM, HP and Dell are salivating on getting a piece of the "Cloud Storage" pie.

Sunday, March 16, 2008

Bandwidth, one hurdle in adopting Cloud Storage

This weekend, I read NY Times article Video Road Hogs Stir Fear of Internet Traffic Jam.
Last year, by one estimate, the video site YouTube, owned by Google, consumed as much bandwidth as the entire Internet did in 2000. …

In a widely cited report published last November, a research firm projected that user demand for the Internet could outpace network capacity by 2011. …

Moving images, far more than words or sounds, are hefty rivers of digital bits as they traverse the Internet’s pipes and gateways, requiring, in industry parlance, more bandwidth.
While reading the article, it occurred to me that isn't bandwidth going to be the main hurdle in adoption of storage in the cloud. When clients are not happy with 10/100/1000Mbps connection with application/server/data center, how can they be happy with DSL/Cable/T1/T3 connection to the cloud? I am sure everyone has felt the pain of trying to transfer large datasets over the Internet.

If you review the introduction and growth of various Amazon Web Services (AWS), a comparatively established cloud player, you will notice very limited use cases of Simple Storage Service (S3) on its own with clients outside the cloud. Most S3 usage is fronted by another AWS in the cloud such as Elastic Compute Cloud (EC2). Such combinations overcome the challenge of transferring large amount of data between storage cloud and an application/server outside the cloud over Internet. For cloud storage to be successful, it need to be in the same cloud with application/server or connected to application/server cloud with high speed link.

Any technology that can reduce the data transfer between the cloud services and clients outside the cloud will be the big beneficiary in this trend. Caching, Compression, and Data De-duplication will most likely benefit in the near term. And, the future seems to be very much like the past aka mainframe - Desktop Virtualization, Streaming, and On-the-Fly Visualization.

So, how will new cloud players like Nirvanix, EMC Mozy and Rackspace differentiate?