Wednesday, March 19, 2008

How much data is in Amazon S3?

Today, Werner Vogels mentioned in his blog post Happy Birthday, Amazon S3! about the second birthday of Amazon S3 and also shared that by Jan 2008, S3 is storing 14 billion objects. I am not sure why Werner and others at Amazon are so cagey about sharing actual storage capacity used in AWS. In the past, I also have met with either silence or "trade secret" or "competitive advantage" response to my inquiries.

In my opinion, it only creates room for speculation as I am going to do with this post. So, how much data is stored on S3?

My initial guesstimate for stored data volume is between 14 and 70EB (Yes, EB is Exabyte) based on the published information about the size of individual object being one to five GB. Doesn't it seem very high? At first, it did to me. I have been trying to come up with alternate methods to estimate stored data volume like the typical size and type of data being stored by various services that are using S3. Even with an average value of 100MB per object, the stored data volume comes out to be 1.4 Exabyte, still a huge number for such a young service.

What is your estimate? Any suggestions on estimation method to arrive at more accurate number for data volume stored on S3.

Considering that S3 may be hosting Exabyte or more of data with in two years of existence, no wonder all established vendors EMC, IBM, HP and Dell are salivating on getting a piece of the "Cloud Storage" pie.

4 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. (Deleted previous comment for editorial purposes.)

    Simple economics point to the failure of your computation:

    At 70 exabytes of data, Amazon S3 would be making $11,274,289,152 a month (at $0.15 per GB). The more conservative (heh) estimate of 14 exabytes puts the monthly revenues at $2,254,857,830.

    A recent report puts ALL of AWS at the 50 to 70 million in revenue for the year.

    Let us pretend that, of the 70 million, 40 million in revenue was attributed to S3 alone for last year. That would be $3,333,333 a month for S3. This converts to 22,222,222 gigabytes, or 0.02 exabytes.

    The reality is that they do hold it secret and quite as they probably should. Further, the best way to estimate the size is based on revenue numbers, not the number of objects stored.

    ReplyDelete
  3. Anil,

    I think your object size estimate is way off(large)by several orders of magnitude. Speaking with no experience using S3, I have no reason to say this, other than the final computation looks insanely huge.

    ReplyDelete
  4. 22PB sounds about right, but that doesn't account for replication.

    I have never seen any information about backup or replication of S3 storage, but Google uses 3 copies of everything in GFS. The number of copies can then be increased for data which is accessed frequently.

    This would put the figure up to a respectable 66PB. Not bad for two years.

    ReplyDelete