Sunday, March 25, 2007

Data Deluge - Storage software need to step up!

From EMC sponsored IDC study, The Expanding Digital Universe: A Forecast of Worldwide Information Growth Through 2010 [PDF].
In 2006, the amount of digital information created, captured, and replicated was 161 Exabyte (EB) = 161,000 Petabyte (PB) = 161,000,000 Terabyte (TB). Three million times the information ever written in books.
From storage perspective, 403 million hard drives of average 400GB capacity each without any RAID protection would have been required to store all the digital data, 93% of all hard drives produced in 2006.
In 2010, the information added annually to the digital universe will increase six folds to 988 EB.
To use same number (403 million) of hard drives in 2010 as in 2006, disk capacity will have to increase to average 2.4 TB.

The most telling statement in IDC study, I found, is:
In 2007 the amount of information created will surpass, for the first time, the storage capacity available.
This only goes to show that we just cannot rely on capacity of storage media to store all information created. Software has to step up to make sure that we can store all information created on the storage available to us. And that is why I believe technologies like data de-duplication and compression will finally become appealing in primary storage arena. See, my previous post Data De-duplication for Primary Storage.

Also, check out my new favorite storage startup Storewiz focusing on compression for primary storage.

Sunday, March 11, 2007

Update: Canceled - Seattle Storage Event: Service-Oriented Architectures (SOA)


Checkout the local storage event organized by Puget Sound Storage Networking User Group (PS-SNUG) on Service-Oriented Architectures (SOA) later this month.

DATE: Thursday, March 29, 2007

TIME: 12:30 PM - 2:30 PM

LOCATION: SANZ Inc., 150 Nickerson Street, Suite 206, Seattle, WA 98109. Free parking available! Map

TOPIC: Service-Oriented Architectures (SOA): Designing Storage Environments to Match Business Needs

SPEAKER: Abbott Schindler, HP StorageWorks. Abbott Schindler has been in Storage development for 11 years; and a storage technologist with technical marketing experience for the past 15 years. He is currently involved with storage grids, virtualization and other leading edge strategies for HP. He has delivered extensive trainings and public presentations internationally, both internal and external to HP.

AGENDA: 12:30 PM: Pizza, Soda, and Networking. 1:00 PM: Presentation. 2:00 PM: Q&A. 2:30 PM: Conclusion and $50 Best Buy Gift Card Drawing

RSVP: To register, please visit SNUG website.

There is no charge to attend; however, we ask that you register for refreshment planning purposes. Meetings are open to anyone interested in discussing data storage in a vendor-neutral, education-focused environment.

Friday, March 02, 2007

Distributing Desperate Housewives to Ten Millions

Now the title and image caught your attention, the big let down is this post has no housewives to offer! It is about distributing episodes of ABC Television show “Desperate Housewives” over Internet … or may be not even that!!

After my previous post P2P powered Devices … coming soon?, Newell Edmond co-founder of GridNetworks forwarded me an interesting paper on Video Internet. And this paper led me to March 2, 2006 column by Robert Cringely Peering into the Future: Why P2P is the Future of Media Distribution even if ISPs have yet to Figure that out.
"Desperate Housewives," in its puny 320-by-240 iTunes incarnation, occupies an average of 210 megabytes per episode. A full-resolution version would be larger still. In theory, it would be four times as big, but practically it would probably come in at double the size or 420 megabytes. But let's stick with the little iTunes version for this example.

Twenty million viewers, on average, watch "Desperate Housewives" each week in about 10 million U.S. households. That's 210 megabytes times 10 million downloads, or 2.1 petabytes of data to be downloaded per episode. Fortunately for the download business model, not everyone is trying to watch the show at the same time or in real time, so iTunes, in this example, has some time to do all those downloads. Let's give them three days. The question on the table is what size Internet pipe would it take to transfer 2.1 petabytes in 72 hours? I did the math, and it requires 64 gigabits-per-second, which would require an OC-768 fiber link and two OC-256s to fulfill.
Even though, Cringely was discussing bandwidth challenges of transferring one episode of Desperate Housewives, my mind wandered off to storage infrastructure side of the equation.

What type of storage infrastructure ecosystem will someone need to fulfill Ten million requests for distributing one episode of Desperate Housewives?

In my opinion, a storage infrastructure built around monolithic centralized storage most probably wouldn’t be practical. But this post is not about my opinion. It is about yours, so chime in with your thoughts on potential solution to this problem.

Show your design prowess or extol virtues of your favorite storage vendors with your storage ecosystem design. All responses are welcome.