Tuesday, July 24, 2007

Tales of Storage IPOs: What happened to Quality?

A comparison of upcoming IPOs, Netezza, BladeLogic, Voltaire and Compellent, with recent storage IPOs. If VMware is to be shown on this chart, it will be outside of upper-right quadrant.

Monday, July 16, 2007

Power Consumption by Google Services

Even though, Google doesn’t share a lot of details of their infrastructure, as we have seen from limited published information, they are obsessed with continuously monitoring, managing and improving the efficiency of their infrastructure.

Recently, Robin Harris attended the Google conference on scalability and then mused How Yahoo can beat Google. Few months ago, Google published results of their work on disk drive failure in paper Failure Trends in a Large Disk Drive Population [PDF]. It was extensively covered in blogosphere including by me in blog entries SMART not so smart in predicting disk drive failure and Google Findings of Disk Failures Rates and Implications and by Robin Harris in his blog entry Google’s Disk Failure Experience.

Google has done it again and presented results of their work on power consumption and provisioning in paper Power Provisioning for a Warehouse-sized Computer [PDF] at the ACM International Symposium on Computer Architecture, San Diego CA, June 9 – 13, 2007. In this work, Google researchers, Xiaobo Fan, Wolf-Dietrich Weber and Luiz Andre Barroso looked in to 15,000 servers running three different applications – Websearch, Webmail and Mapreduce for six months to determine the power usage characteristics at Rack, PDU and Datacenter levels.

Google Services

Websearch: A service with high request throughput and large data processing for each request.

Webmail: A disk I/O intensive service. Machines configured with large number of disk drives. Each request involves a relatively small number of servers.

Mapreduce: A cluster dedicated to running large offline batch jobs. Involve process terabytes of data using thousands of machines.

Key Findings

The key findings from this work are:
  1. The difference between maximum power used by large number of computing devices, cumulatively, and their theoretical peak usage can be as much as 40% in datacenters.

  2. It may be more efficient to leverage power management techniques at datacenter level than at rack level.

  3. Nameplate ratings are of little use in power provisioning as they significantly overestimate actual maximum usage.

  4. CPU utilization as a measure of machine-level activity produces accurate results for dynamic power usage especially with large group of machines. The dynamic power range is less than 30% for disks and negligible for motherboards.

  5. Using maximum power draw of individual machines to provision the datacenter, will have some stranded capacity.

  6. A mix of diverse workload reduces the difference between average and peak power, an argument in favor of mixed deployment.

  7. Idle power is significantly lower than the actual peak power, but generally never below 50%.

  8. CPU dynamic voltage/frequency scaling may yield moderate energy savings (up to 23%) at datacenter levels.

  9. Peak power consumption at the data center level could be reduced by 30% and energy usage could be halved if systems were designed so that lower activity levels meant correspondingly lower power usage profiles.
I believe there is a value proposition in this study waiting to be discovered by Copan Systems for their MAID technology to be applied at PDU and datacenter scales or even along the lines of SUN Project Blackbox.

More details from this study later.

Sunday, July 15, 2007

Tales of Storage IPOs: Hardware or Software and Components

Recently, an interesting question about the preferred packaging of the storage product by startups came up during my conversation with few private equity acquaintances. The opinions were kind of biased and fell along their past professional background.

The opinion from finance-focused guy was that most software products require less initial investment in infrastructure and working capital needs for distribution, service and support. The software startups tend to become cash flow positive and profitable lot more quickly with early market successes. The opinion from sales-focused guy was to have a product in physical form that a potential customer can visualize, feel and touch. A physical product tend to raise fewer investigative queries about the inner-workings and the inquisitive knob-turnings that tend to happen with software product.

I am of the opinion that irrespective of the storage product packaging, the differentiation tend to come, most times, from the software running under the covers or as a standalone product. One lesson, I learned from my last entrepreneurial storage adventure was that during early round of financing, the pressure for a "working" prototype is lot greater with a storage software product than hardware product.

I decided to further look in to the question of whether startups with physical storage product are perceived better by looking at the value placed on them in the marketplace. Unfortunately, the earliest available public financial data for startup is when it files for IPO registration.

This is an attempt in finding patterns for startups from the public financial data of recent IPOs by Isilon, Data Domain, Riverbed, Mallanox, Commvault and Double-Take. A visual representation of revenue, income/loss, market capitalization and change in market cap after 12 days of trading for these six recent IPOs is shown below.


Be mindful of the limitations and assumptions made to simplify the analysis such as few data points, number of outstanding shares remaining unchanged at IPO, market cap as indicator of value of company and equating price change with change in market cap over short period of trading after IPO, and adjusting revenue and income to make them consistent across all IPOs.

Some of my observations from this analysis are:

Storage software and component vendors were valued less at IPO compared to storage hardware despite being profitable and revenues at par or higher. During early trading, public markets didn't reward software and component vendors for better financial strength either. The gains in market cap during first 12 days of trading were significantly higher for hardware IPOs. Being profitable doesn't necessarily gets rewarded. The extent of profit/loss had little bearing on performance of storage IPOs. Maturity of business results in lower rewards. Double-Take was seriously undervalued and under-appreciated in all aspects.

Last Friday, Gary also wrote about the price performance of storage IPOs during past six months in his post IPO Class of 2006. He mentioned that Riverbed and Double-Take had highest percentage price gain, while Isilon is swooning and Commvault is in doldrums. May be the efficiency of the marketplace is being finally realized.

As I was using IPO data as a proxy to startup value, I decided not to extend the time period of analysis (may do anyway, now). The primary reason was to simplify the analysis and keep the data mining and normalization workload manageable - several vendors had secondary offerings, number of outstanding shares may have changed, temporary changes in financial data resulting from IPO event and proceeds, and lock-up expiration.

Share your thoughts and opinions on the success/failure of startups and storage IPOs.

Monday, July 09, 2007

3PAR: System Design. Part I

One of the core features of 3PAR, in my opinion, is System Design that facilitates scalability, availability, performance and ease of use. As posted in my previous blog entry, Craig Nunes mentioned two key attributes of their system design, scalability and clustered modular architecture.
… our customers have taken great advantage of the scalability of the array. The clustered, modular architecture eliminates the price premiums of the monolithic arrays and scaling complexity of modular array architectures.
Another reader mentioned two key benefits of 3PAR system design, the performance and ease of use, similar to what I heard from several users and evaluators of 3PAR at SNW. The data is spread across every spindle, up to 2,560 drives, and workload is spread across all active/active controller nodes. The LUNs are built from available raw disks with little need for pre-planning, pre-configuration of disk/parity groups or deciding on what disk spindles to use.
3PAR Utility Storage product brief also gives some insight in to the system. I know the brief is a marketing spin but I am working on getting more technical details from 3PAR.
Central to the design is a high bandwidth, low-latency backplane that unifies …, … modular and upgradeable components into a highly available, … automatically load-balanced cluster.

A full-mesh, … passive backplane provides a dedicated … data path between each and every 3PAR ASIC, one of which resides in every 3PAR Controller Node.
Physical disks … are divided into uniform 256-MB chunklets. Chunklets from across the system are then automatically selected and grouped to meet user-defined levels of performance, cost and availability (varying such parameters as RAID type, drive type, radical placement and stripe width).

Customers can non-disruptively alter a volume’s underlying RAID protection level, drive type, stripe width and/or radial placement.
In my next post, I will dig deeper in to 3PAR system design, hopefully with some help from 3PAR and their customers so that I don’t need to make up stuff.

Sunday, July 01, 2007

3PAR: The Follow-up

Sometime, a blog post touches the nerves and that was the case with my last blog post 3PAR: Reversal of Fortune. I heard from numerous readers including people from Bluearc and 3PAR. Craig Nunes, VP Marketing at 3PAR sent me a detailed email highlighting the reasons why customers are choosing 3PAR.
The success of our products, as you mentioned hearing from several customers, has a lot to do with ease of use. Those who have chosen 3PAR have been able to provision capacity in seconds without preplanning. They have been able to eliminate performance tuning activities because all volumes have access to all resources within this massively parallel system. They have also been able to move data from one storage tier to another or redistribute existing volumes across new resources as they are added to the system in a single command, completely non-disruptively.

That being said, ease of use is only part of it -- efficiency is a also a big part of why customers have deployed 3PAR in their data centers. As you mentioned, Thin Provisioning is getting a great deal of attention and is now moving into large-scale deployments within very conservative companies. But efficient local and remote copy capabilities -- both built on thin copy technology -- are another major reason customers have chosen 3PAR. Organizations have taken substantial advantage of these products, deploying hourly (writeable) snapshots for granular and rapid recovery, and cost-effective tier one remote data replication solutions without the need for vendor professional services.

Finally, our customers have taken great advantage of the scalability of the array. The clustered, modular architecture eliminates the price premiums of the monolithic arrays and scaling complexity of modular array architectures. Our customers start small, with as few as two controllers, yet grow non-disruptively and massively within a single, fully-tiered system. Organizations, especially those with high growth requirements, have chosen to grown within their 3PAR array instead of facing the complexity of managing 5 or 10 or more arrays from other vendors.
Another reader put things more succinctly.
Thin provisioning is great, and is definitely a killer feature, but it's not THE killer feature. I'd say it's one of four things that 3PAR does really well. The combination of those four things in one box, is what differentiates 3PAR from other vendors in the market.
The key claims, Craig made were:
  • Ease of use - Provision capacity without preplanning. Eliminate performance tuning.

  • Efficiency - Thin provisioning. Efficient local and remote copy capabilities.

  • Scalability of array - Clustered modular architecture. Grow non-disruptively and massively within a single fully-tiered system.
I don't have any first-hand experience with 3PAR product. Do you have any experience with them? What is your opinion about 3PAR product and technology? This month, I would like to further look in to 3PAR technology. Any documents, comments, emails and phone calls that help me understand 3PAR better are most welcome.