Thursday, December 11, 2008

Any Vendor Strategy, why not?

Initially, I was going to post a comment on Chris Evan's recent post 2V or Not 2V (vendors this is). With the increasing length of the comment, I decided to turn it in to a blog post of my own. Chris succinctly covered the operational aspects and challenges of multi-vendor strategy.

The challenge is how deep do you go in your environment to have multiple vendors. Do you want to have multiple vendors for,
  • only large items like storage subsystems?
  • smaller stuff like HBAs and switches too?
  • commodity type stuff that has little differentiation among vendors?
  • specialized products?
Just because you have multiple vendors, doesn't necessarily gives you $ bargaining power. Bargaining power comes with the transaction volume, transaction size, transaction frequency and your value to the vendor.

At the smaller end, though you can achieve better operational efficiency by standardizing on single vendor, you don't have the volume and size for a single vendor to take you seriously. Unless by consolidating all your purchases you get the volume and size to be valuable to a vendor, why not just buy the best-of-breed solutions?

How much operational efficiency are you going to gain by buying three Clariion versus one Clariion, one 3Par and one Compellant?

At the high end, single vendor strategy hinders your ability to adopt innovation and new technologies with minimal gains in operational efficiency (remember large teams can be split among multiple vendors if needed) though you may be valuable to the vendor and get better pricing. How much operational efficiency are you going to lose by adding three 3Pars to couple of dozen AMS, you already have?

I have seen, heard and experienced enough horror stories to believe either single or multiple vendor strategy for any one organization is a right strategy. I favor Any Vendor strategy where your decisions are driven by the best solution that meets your need and not a solution from a pre-selected vendors that somewhat meets the needs.

Tuesday, November 25, 2008

Adaptec Advisors are Back!

Adaptec PR firm sent a note mentioning that Adaptec Storage Advisor's blog is back! Check it out.

I am also trying to get back to updating my blog after a long hiatus. Hopefully with some small and quick blog posts on regular basis, my writing habit will establish. In the mean time, enjoy the sights from my various trips.

How do you overcome writing drought?

Monday, July 14, 2008

Online Backup Services - Six Questions

During my visit to Denver few weeks ago, I had the opportunity to talk with folks working with online backup and archive cloud services. Some of my impressions from these discussions are interesting and worth sharing. These are based on what I heard from professionals working for or providing services to online backup service providers. These are not result of a full-blown survey, and at best anecdotal. You are welcome to respond to these questions if you like via comments, emails or your own blog post.

Q1: Who are the primary adopters of Online Backup Services?

Individuals and small businesses.
Entities with fewer than a dozen workstations .
Few with a centralized server.

Q2: What was the primary backup method before adopting online backup?

A USB key or USB attached disk drive.
Few with a share on another workstation.

Q3: What was the offsite backup strategy before adopting online backup?

A Floppy, USB key or CD with important files.
Few with a mobile HD.

Q4: What is the subscription and retention rates for online backup service?

High subscription rate.
Very low retention rate.
Most abandoned service within few weeks.

Q5: What are the primary reasons provided for discontinuing use of online backup service?

Excessive use of Internet connection.
Backup takes too long.
Poor experience during primary use of workstation.

Q6: What was the backup method after discontinuing online backup?

A USB attached disk drive.
A NAS device on network.
Few with no backup method.


Overall, online backup services seems to be a great way to introduce backups to people with no prior backup methods as only few reverted back to no backups after discontinuing use of online backup service. Tape is non-existent in environments that are finding online backups attractive. Despite heightened awareness of online backup service, the low bandwidth connection to Internet continues to be main hurdle in retaining subscribers, a focus on spending limited resources on sales improving or cost reducing services over a fear-based buying decision. A comment I heard was,
I prefer to allocate 50% of Internet bandwidth to VoIP services that reduce my telecommunication cost instead of to offsite backup.

Sunday, June 22, 2008

Denver Visit, New Piñata & Scalability Videos

Week in Denver

I will be in Denver this week till Friday June 27th. Unfortunately, I will miss nPost Golf 2.0 event in Seattle.

Despite a busy work day schedule in Denver, I am looking forward to seeing some friends and colleagues also. If you are a fellow storage blogger or reader or working on a cool storage technology and located in Denver area, ping me and we can meet one evening during my visit.

New Piñata for EMC & IBM

Recently, a reader alerted me to new Data Domain blog Dedupe Matters written by Brian Biles. Welcome Brian to the world of Bloggers. Lets see how quickly EMC and IBM bloggers make you the new piñata like they did to HDS bloggers. ;-)

In any case, it’s a nice change from their rumored no blogging policy. Hopefully, blogging at Data Domain will go beyond people in Ivory Towers.

Video of Presentations from Google Scalability Conference

Google already uploaded the videos of presentations from last week's Google Scalability Conference. I also plan to discuss some of the presentation topics in further details as time permits.

Welcome Remarks by Brian Bershad

GIGA+ by Swapnil Patil

HPC with NetworkSpaces for R by David Hendersen

Chapel by Brad Chamberlain

SMP via Transactional Memory by Vijay Menon

Communicating Like Nemo by Jennifer Wong

Maidsafe by David Irvine

CARMEN by Paul Watson

Scalable Wikipedia by Thorsten Schuett

Monday, June 16, 2008

Google Conference on Scalability - First Impression

As expected from conference schedule, Google conference turned out to be a technical event primarily focused on parallel programming and infrastructure scalability. At last minute, Google decided to merge two tracks in to one. Though, I got to attend all the sessions, they felt time-compressed and rushed. I was surprised to see lot of attendees who came from outside Seattle. I met quite a few people from Bay area, Canada and Europe. I enjoyed the sessions though some audience members commented about very technical nature of the conference compared to previous year. As Brian Bershad, Google commented in his welcome speech, the challenge is to find technologies and solutions to scale handling search queries from 600 million to 6 billion. And, I came away better informed on different challenges and potential solutions we may see down the road.

I also sat down and chatted with Robin Harris. We decided to forego making a video of our conversation. I am not a big fan of talking head videos or podcasts unless they leverage the unique values of these methods not available through written words or pictures. And who wants to listen to two storage bloggers chatting about nothing. I find them miserable myself so why put others through the same misery.

In my opinion, three sessions: CARMEN: a Scalable Science Cloud [PDF], GIGA+: Scalable Directories for Shared File Systems [PDF] and maidsafe stood out at the conference from infrastructure scalability perspective. Communicating Like Nemo was very entertaining. The common theme in audience questions on most infrastructure presentations was reliability, availability, scalability, and security of the offered solution. It is a good indication of what is on the mind of people when evaluating new infrastructure offerings. With the popularity of hashing in storage of data, speeding up hash lookup is becoming an interesting problem for scalability.

David Irvine's session on maidsafe was the only session where a speaker white-boarded most of the presentation. His confidence and knowledge was commendable. Not many speakers can pull off white-boarding 80% of presentation with 100s in audience. Comparing maidsafe with ant colony was an interesting way to show scalability and simplicity of solution. Maidsafe solution seems to be in same category as RevStor, Seanodes, Cleversafe, Oceanstor, Farsite and several others that are trying to leverage storage across 100s and 1,000s of distributed nodes in a peer-to-peer or quid pro quo network, a solution most likely attractive to players in cloud and web distribution market.

Thursday, June 12, 2008

Heading to Google Scalability Conference

Like Robin Harris, I am also attending Google Scalability Conference in Seattle Friday and Saturday. Hopefully, I will see him at the event. All sessions at Google conference look good. Unfortunately, I will miss half of them as two tracks are running in parallel. I am looking forward to hearing about maidsafe, CARMEN, GIGA+, and Google Maps scale down.

Recently, several readers inquired whether I lost interest in blogging as I am posting very irregularly. Nothing can be further from the truth. I am still as excited about blogging as I was almost five years ago when I first delve into blogs. The lack of frequent updates is due to my attention being somewhere else (new dig, rig and gig). Will elaborate some other time.

BTW, checkout Storage Optimization blog by Carter George. His startup Ocarina got interesting story with their data footprint reduction technology. I am looking forward to learning more once I refocus on new exciting stuff in storage. Hopefully, Ocarina can change the KPCB's luck in storage space.

Tuesday, May 06, 2008

Do You Got Talent?

After a long day of work, this clip made me laugh. Nothing to do with data storage, just a performance from British TV Show, Britain Got Talent.

Thursday, April 17, 2008

Online Backup: 100% Install

My last post Online Backup any different from Traditional Backup for Laptop/Desktop? was quickly turned in to us vs. them argument by Beth Pariseau in her blog post Blog dialogue: Online vs. traditional backup. I guess my curiosity and conversation starter about slow adoption of online backup didn't come across clearly.
… Gupta probably has “too much” experience with backup clients to necessarily see things from the SMB customer’s point of view. For him, installing a backup client isn’t a big deal–for some, it might be enough of a reason to let somebody else deal with it.
Initially, I thought about pulling Tony on her. On a side note, I wonder why Tony spills coffee every time Hu sneezes.

More I analyzed her statements, more I realized her opinions most likely resulted from what she heard as a storage news writer and from whom instead of her own experiences. Keywords like SMB are a good giveaway whom she is listening to. Not many practitioners try to segment customers with mile-wide brush. ;-)

Lets start with addressing her installation related concerns. Do online backup services magically appear and start working on your laptop/desktop by themselves? No, someone has to download and install them. Only backup clients that come pre-installed on your system are the ones that don't require install. As I understand, there are two main backup clients available that don't require installation and readily available to users, one provided by Microsoft with Windows XP (Windows Backup) and other one provided by Apple with Leopard (Time Machine).

Lets add configuration of the backup client to the part of "difficult to install" equation. Configuration of Mozy Pro [PDF 46 pages] and Windows Backup [Web page - 6 pages if you decide to print], are available online for your review and comparison. Of course, Time Machine is so simple to configure that even someone like me, who misunderstands backup needs of SMB according to a marketer, implemented on MacBook without instructions. BTW, AppleInsider article Road to Mac OS X Leopard: Time Machine is a good overview of Time Machine.

You be the judge how difficult each one is to install and configure.

As I wrote in my comment on Beth's blog, my intention is not to promote one method over another, just to show similarities and question the current implementations. Hopefully, these posts are setting the stage for future opinions and conversations that will help improve current BaaS offerings and develop new ones.

More to come.

Thursday, April 10, 2008

Online Backup any different from Traditional Backup for Laptop/Desktop?

Recently, Beth Pariseau wrote in her blog post HP unveils unlimited online storage for SOHO market that bandwidth is one of the hurdles in adoption of online backup services.
Like most online storage offerings to date, this offering is small in scale and limited in its features when compared with on-premise products. Most analysts and vendors say online storage will be limited by bandwidth constraints and security concerns to the low end of the market, with most services on the market looking a lot like HP Upline.
Though, it is a validation of my thoughts expressed in blog post Bandwidth, one hurdle in adopting Cloud Storage, I am not totally convinced of bandwidth being the root cause of limited adoption. There may be something else hindering adoption of online backup services.

Recently, Scott Waterhouse, an EMC blogger also has been discussing the virtues of Mozy, an online backup service (acquired) by EMC. I agree with his argument about the challenges of traditional backup clients in post Mozy as the Future of Backup.
Big business has a lot of data on laptops and desktops. Traditionally, installing backup clients on these systems has been costly, full of headaches, and generally causes more problems than it solves. The consequence of this is that most folks just don't protect them.
Is Mozy client any different? Is there any difference in installing, configuring, using and maintaining traditional backup client versus Mozy client on laptop/desktop? Nothing, I noticed after reading his posts.

My intention is not to pick on Mozy or Scott but there is nothing unique in most Online Backup Services that couldn't be in traditional backup for laptop/desktop. At least traditional backup also come with peace of mind that all backups are stored on company's own infrastructure. In last few years, I tried over a dozen online backup services in addition to putting up with traditional backup clients for laptop/desktop and I don't see much difference among the two.

IMO, most online backup services are just taking existing on-premise backup strategy for laptops/desktops and repackaging it to run backups to somebody else's infrastructure instead of your own. What do you think?

Thursday, March 27, 2008

Storage Jobs @ Startups

Recently, Nathan Kaiser at nPost contacted me regarding his new widget displaying Startup Jobs on blogs. As sidebar on my blog is already too long, I decided to include his widget in a blog post. Try it out and let me know your feedback (positive and negative).

P.S. If you are using a RSS reader like Google Reader and don't see the widget, please visit my blog. While I am writing this post, I am not sure if widget will show up in the blog post either. In case it doesn't, please visit nPost Startups Jobs site to check out the startup jobs. Use keyword "storage" to find storage jobs at startups.

Sunday, March 23, 2008

Is number of objects true indicator of Amazon S3 growth?

In my last blog post, I estimated the data stored on Amazon S3 in exabyte range using 18 billion objects stored reported by Amazon CTO, Werner Vogels in his blog post.

In retrospect, it was an over-estimation by several order of magnitude (my bad) that was promptly corrected by MikeDoug using another data point AWS revenue. MikeDoug estimated (comment excerpts below) the data stored to be in 20PB (petabyte) range, way short of my estimates and may be more closer to reality.

No, doubt, it is still a significantly large number for a service that is only few years old. But, S3 growing up fast may not be as obvious from growth in stored objects as Vogels would like us to believe.
A recent report puts ALL of AWS at the 50 to 70 million in revenue for the year.

Let us pretend that, of the 70 million, 40 million in revenue was attributed to S3 alone for last year. That would be $3,333,333 a month for S3. This converts to 22,222,222 gigabytes, or 0.02 exabytes.
Other interesting tidbits if S3 has 20PB of stored data, 18 billion objects and 330,000 registered developers:

On average, each object is only storing about a megabyte of data. This number seems quite low so either deleted objects are being included in the published number of objects or developers are keeping object size low to prevent transfer timeouts.

On average, each developer is only storing 54GB of data. Considering some services like SmugMug are storing terabytes of data on S3, most probably there are lot of registered developers either not using S3 actively for storing data or have services under development.

Wednesday, March 19, 2008

How much data is in Amazon S3?

Today, Werner Vogels mentioned in his blog post Happy Birthday, Amazon S3! about the second birthday of Amazon S3 and also shared that by Jan 2008, S3 is storing 14 billion objects. I am not sure why Werner and others at Amazon are so cagey about sharing actual storage capacity used in AWS. In the past, I also have met with either silence or "trade secret" or "competitive advantage" response to my inquiries.

In my opinion, it only creates room for speculation as I am going to do with this post. So, how much data is stored on S3?

My initial guesstimate for stored data volume is between 14 and 70EB (Yes, EB is Exabyte) based on the published information about the size of individual object being one to five GB. Doesn't it seem very high? At first, it did to me. I have been trying to come up with alternate methods to estimate stored data volume like the typical size and type of data being stored by various services that are using S3. Even with an average value of 100MB per object, the stored data volume comes out to be 1.4 Exabyte, still a huge number for such a young service.

What is your estimate? Any suggestions on estimation method to arrive at more accurate number for data volume stored on S3.

Considering that S3 may be hosting Exabyte or more of data with in two years of existence, no wonder all established vendors EMC, IBM, HP and Dell are salivating on getting a piece of the "Cloud Storage" pie.

Sunday, March 16, 2008

Bandwidth, one hurdle in adopting Cloud Storage

This weekend, I read NY Times article Video Road Hogs Stir Fear of Internet Traffic Jam.
Last year, by one estimate, the video site YouTube, owned by Google, consumed as much bandwidth as the entire Internet did in 2000. …

In a widely cited report published last November, a research firm projected that user demand for the Internet could outpace network capacity by 2011. …

Moving images, far more than words or sounds, are hefty rivers of digital bits as they traverse the Internet’s pipes and gateways, requiring, in industry parlance, more bandwidth.
While reading the article, it occurred to me that isn't bandwidth going to be the main hurdle in adoption of storage in the cloud. When clients are not happy with 10/100/1000Mbps connection with application/server/data center, how can they be happy with DSL/Cable/T1/T3 connection to the cloud? I am sure everyone has felt the pain of trying to transfer large datasets over the Internet.

If you review the introduction and growth of various Amazon Web Services (AWS), a comparatively established cloud player, you will notice very limited use cases of Simple Storage Service (S3) on its own with clients outside the cloud. Most S3 usage is fronted by another AWS in the cloud such as Elastic Compute Cloud (EC2). Such combinations overcome the challenge of transferring large amount of data between storage cloud and an application/server outside the cloud over Internet. For cloud storage to be successful, it need to be in the same cloud with application/server or connected to application/server cloud with high speed link.

Any technology that can reduce the data transfer between the cloud services and clients outside the cloud will be the big beneficiary in this trend. Caching, Compression, and Data De-duplication will most likely benefit in the near term. And, the future seems to be very much like the past aka mainframe - Desktop Virtualization, Streaming, and On-the-Fly Visualization.

So, how will new cloud players like Nirvanix, EMC Mozy and Rackspace differentiate?

Sunday, February 10, 2008

Are you using online storage services and how?

Last week, Ethan Oberman alerted me to his online storage service SpiderOak after coming across my post Online Backup Services - What's Next?. Since my post last year, I was contacted by several online backup and storage service providers.

Ethan highlighted differentiation of his service primarily in the area of file versioning, delta transfer, secure sharing across machines and users, and zero knowledge security.
Our approach to online backup and storage varies greatly from our competitors - creating a personalized network concept as opposed to simply online backup. …
Similarly, last year, Marcus Hartwell introduced me to Diino service that also focuses in the area of online backup, storage and sharing.

Most online storage services, since late 90's, are mainly focused on serving one or more activities in data management:
  1. Backup,

  2. Sharing, and

  3. Access.
These services are primarily targeting consumers and small businesses, a bottom up approach with hopes that over time mass adoption will result in acceptance by enterprise IT departments. Strangely, none have been able to make significant impact and gain wide-spread momentum. As previously mentioned, dozens of them have come and go, and I am sure you noticed this trend too.

Though, I did try out several services for a short time, I just couldn't see any becoming part of my daily online routine. And, the main adoption challenges seems to be that either I need something that operates "invisibly" or integrates with my current tools and online activities.

Are you using online storage services and how?

Monday, January 28, 2008

Isilon rebounding this year?

Earlier this month, a Seattle PI blogger John Cook asked local VCs about the company that will have a breakout year in 2008. Answers by two local VCs, Bill Bryant of DFJ and Jon Staenberg of RCP caught my attention as both suggested Isilon Systems to have breakout year. I don't know their reasons but I hope these VCs are right. As one of the few local storage companies in Seattle, I would like to see Isilon succeed too.

BTW, how do you define and measure a "breakout" year for a company?

There are no doubts about desirability of Isilon clustering technology and the growth of the targeted media and entertainment segment. But in the short-term, Isilon need to cleanup the mess to benefit from this market opportunity, sooner the better. There are three main hurdles with rebound of Isilon - Financial concerns, impact of loss of revenue from key customers and product quality/service issues.

Financial Concern

The financial concerns about Isilon are primarily resulting from delay in 10-Q filing, bane of being a public company. Also, Isilon didn't do a great job of managing the market expectations, for example releasing the bad news slowly (financial restatements, executives change, revenue shortfalls, delay in 10-Q filing). Typically, public companies release all bad news in one shot, take a big hit in the market and then move on instead of slow bleed in the market.

I wonder, like DGM, how many future customers are sitting on the fence concerned about long-term financial viability of Isilon.
And then there's the other problem - googling for technical information I came across a whole set of entries suggesting that there might be some financial problems in the parent.
Loss of Key Customer

As mentioned in 8-K filed Nov 4, 2007, it seems Isilon has some issues accounting for sales to certain resellers and customers. Also, as mentioned by Seeking Alpha, one of Isilon's largest customer Kodak accounted for no revenue in Q3 of 2007.
Company blamed one of its largest customers - Kodak (EK) (17% of revenues last quarter went to 0% this quarter - ZILCH) amongst weakness in Europe for the short-fall.
How much of the impact does these two events have on the bottom line going forward? I decided to take a look at Isilon revenue with and without two of its largest customers - Eastman Kodak and Comcast for last few years.

A quick note about the assumptions and trends in the above charts:
  • No revenue from Kodak in Q3'07

  • No change in % quarterly revenue contribution by Comcast in Q3'07 from previous quarter and same % as in Q3'06.

  • For the quarters, where % revenue contribution by Comcast or Kodak were not available, estimates are made based on % annual revenue contribution.

  • Closer the - Kodak or - Comcast curves to Total curve, lower the contribution by those customers to total revenue.
Since IPO, Kodak had contributed a large slice of total revenue to Isilon. But, the past history also indicates that in second half of year Kodak typically contributed less to total revenue compare to first half. In my opinion, Kodak and Comcast enabled Isilon to go public at least two quarters earlier than they should have. Is the disclosure of no revenue from Kodak in Q3'07 indicate that future revenues from Kodak are likely to be significantly low? Lets hope that independent review of some sales is not going to impact revenues from Comcast.

Service/Product Quality

As any product company selling to large enterprise learns sooner or later that an internal post sales service organization complementing third party service providers is needed to provide exceptional service. It seems recently Isilon has been building up its internal service capabilities.

The change in contract manufacturer seems reasonable considering agreement was close to expiration though I expect there will be short-term hardware quality pain during the transition and ramp up. I couldn't find any smoking gun to support any talk about product operation or compatibility issues.


My take is that addressing financial concern should be Isilon's top priority. Clearing the financial picture will be the main hurdle in their rebound. It is also great to see Isilon returning to its roots in technology and product innovation instead of trying process innovation like established companies.

Monday, January 21, 2008

Data Storage Companies in Seattle

Recently, during review of Seattle tech ecosystem (most don’t target infrastructure segment), it was interesting to note the diverse, emerging and unique, areas in data storage being targeted by local tech companies - Isilon Systems, F5 Networks, Bocada, Amazon and Microsoft.

Whom did I miss on this list of local data storage companies? Can Illumita, a local “virtualization over Internet” startup, be considered as a potential influencer in data storage space?

This year, I plan to write more often about these and other local Seattle data storage companies.