Pages

Tuesday, December 25, 2007

EMC Academic Alliance - A Good Storage Education Initiative

… Continuation from last post, Are We Educating Future Storage Professionals?


I subscribe to ACM Digital Library and enjoy reading the latest storage research happening in academia and research organizations. Recently, I came across an interesting paper in Proceeding of the 8th ACM SIG-Information Conference on Information Technology Education on EMC Academic Alliance program (See, Storage Technologies: An Education Opportunity, Ed Van Sickle et. al. SIGITE’07, October 18-20, 2007, Destin Florida USA).

In this paper, Ed discusses EMC realizing during hiring process that very few recent graduates had any knowledge of storage technologies. Initially EMC tried boot camp approach. Then, EMC concluded that greater benefits may be achieved by creating courses focused on storage technologies at university level.

This gave birth to EMC Academic Alliance Program [PDF] with goals of educating CS/IT students on storage and support for knowledge transfer, guest lectures, and site visits. In my opinion, it is an impressive initiative and kudos to EMC for recognizing the shortage of storage skills and taking the lead with potential solutions. Why are EMC bloggers not highlighting and promoting such a positive initiative?

Why is an industry association like SNIA not leading Academic alliance initiatives? It doesn’t look like education is a part of their new mission.

The paper also showcases the implementation of this initiative at four universities and provides overview of the courses held and plans for future classes. This is a great compliment for the course offering under this program at Penn State University (PSU).
Subsequent offerings of the course filled to room capacity based on the positive word-of-mouth that the course generated. In fact, students from other PSU colleges (engineering, computer science) have requested to be added to the course.
Of course, program also has its challenges as encountered at University of Massachusetts, Dartmouth with course material being unsuitable for the targeted student segment and unavailability of suitable text book, and at North Carolina A&T University with low student interest and course enrollment.

The paper also mentions the interest of Dr. Cameron (one of the co-authors) at Penn State University in developing a three course storage track but being constrained by lack of teaching resources. Hopefully, he can attract other storage vendors to fulfill the vision of a storage track and overcome the lack of teaching resources through guest lectures by industry professionals.

I wish paper had further explored student/instructor survey results and the challenges facing the program.

Will EMC collaborate with its customers, partners and other storage vendors in growing this initiative? How can rest of the storage industry help in growing the program? How can storage bloggers help?

Sunday, December 23, 2007

Are We Educating Future Storage Professionals?

Happy Holidays to all readers.

During my visit to India over the Thanksgiving, I met some friends and family members whom I haven’t seen for a long time. They inquired what I do and when I responded that I work in data storage, the invariable follow up was “What kind of database?” ;-)

I am always interested in learning more about various storage technologies irrespective of their relevancy to my job at that time. So, when I moved to Seattle, I decided to check out the course offerings in storage by local universities and colleges. To my surprise, there was not even a single class offered on data storage at any of the local educational institutions including University of Washington.

What the above two examples have in common is the lack of awareness in data storage. Despite the criticality of IT infrastructure and data storage to corporations, there is lack of knowledge and focus on these topics by most IT professionals, whether experienced or recent graduates. Most of the storage knowledge seems be gained through on the job or vendor training that typically focuses on only working with specific products. Educational institutions also seem to be oblivious to the need of educating storage technologies to their CS/IT students.

Most storage vendors offer training on their own product portfolio with little or no focus on underlying storage technologies that make up their products. SNIA has tried to bridge the gap through their education tutorials at SNW and vendor-neutral storage certifications. If my experiences at spring SNW is any indication, most attendees to these tutorials are storage professionals themselves. The certifications are also targeted at validating the skills of IT professionals already working in storage rather than attracting experienced IT professionals from non-storage domains.

Is this apathy by storage community toward storage education resulting in storage skills gap? Is storage community doing anything to bridge the skills gap in storage knowledge of experienced IT professionals as well as recent CS/IT graduates?

In my follow-up post, I will discuss an interesting initiative taking place to address the gap in storage education at academic institutions.

Wednesday, December 12, 2007

Is your encrypted data also RIP with NeoScale?

There has been lot of talk in trade publications and blogs about demise of NeoScale. Jon Toigo asked interesting questions about the go-forward strategy of nCipher after picking up remains of NeoScale. Storagezilla also air-raided the encryption appliances by poking holes in Decru and his reluctant acceptance of appliance approach to some "data at rest" encryption problems (I wonder what problems, he is referring to).

After hearing the demise of NeoScale, my second reaction was:
"Hmm ... I wonder if NeoScale customers will think about decrypting the terabytes of vaulted data that was encrypted using Cryptostor before their appliance fails and no chance of finding a replacement."
I am sure Decru and other encryption vendors are salivating on the opportunity to sell in to NeoScale customers, BUT
Can their encryption solution decrypt the data encrypted through Cryptostor?
That is the question NeoScale customers should be asking when talking to encryption vendors about replacing Cryptostor.

I expressed my concerns to some people who are using encryption products. None had considered and/or planned for decrypting the data upon losing access to the tool (product) or method (algorithm) that was used to encrypt the data. It is a real scenario for encrypted data on any kind of removable media despite availability of correct encryption keys.

Just imagine what will you do if seven years from now a government agency requests financial data that was encrypted and archived on a removable media vaulted offsite. And, you realize that you can't read data because you no longer have the original system capable of reading and/or decrypting that data. I experienced the same challenge in a customer environment few years ago though with unencrypted data.

Unlike Mark, I am not very enthusiastic about encrypting "data at rest" specifically where encrypted data is stored separately from the system that wrote the data or is capable of reading that data. The demise of NeoScale may be just the wake up call for the trouble you may get into if you encrypt the "data at rest" and you have no way to decrypt the data because you lost the method or tool or keys to decrypt.

Saturday, December 01, 2007

Is NetApp the only player in File Systems & Storage?

If you read September/October 2007 issue of ACM Queue, you may get that impression.


Recently, I read this issue and interestingly all three articles in Q focus on File Systems and Storage covering pNFS, Hard Disk Drives and Storage Virtualization topics are authored by people from Network Appliance.

Though, these articles are worth the read, it most probably is the oversight of ACM Queue editors that focus on File Systems and Storage became focus for Network Appliance.


Only saving grace for the editors is the Interview with SUN engineers who created ZFS else NetApp marketing will be distributing reprints of whole issue.

Thursday, November 29, 2007

Storage Challenge for Media Firms: High Definition

My last blog post on help designing a suitable storage solution for a creative media firm is generating some quality comments from lonerock, Paul Clifford, and others. Unfortunately, due to few other commitments, I wasn't able to further communicate with the executive at the creative media firm. Below is our last communication that highlights further details of his environment and the challenges faced by his firm.
Why are you considering moving from NAS boxes to SAN solution? Are you using NAS boxes from a specific vendor?
JD: Whenever we have needed to increase our storage we have usually ended up buying more NAS heads – which means we need to manage more equipment. Our equipment is purchased from Dell and is our preferred hardware vendor. Earlier this year we had anticipated growth and increased our storage at that time by 200%. We purchased 2x 2950 Dell Storage Servers with Microsoft SS 2003 giving us about 2.5TB of space on each. We used one of the servers for a live backup. Each of these machines are dual-quad core intel. Middle of the year – we are running low! So instead of buying a NAS, we are looking at the Celerra (though we will use it as a NAS head) as it seems to have a better scalability path. We would then use our current 2950 as render servers.
Are you able to share more details on what specific storage hiccups and access speed issues you are facing?
JD: We do a lot more HD animation work than before which is leading to larger render files, also some our client projects now tend to go on for a year or longer. And we need to keep them active on our systems. Our reluctance to remove old render files, tend to increase size dramatically over the lifespan of a project. Over long weekends, when our render farm is churning away files, we need to make sure we have a clear 200-250GB of space – but usually we are struggling and end up spending a lot of time pruning projects to clear space. Speed wise we don’t think we are in a bad shape – though we have 40 users hitting our NAS box, and it the monitoring device does show a 100% utilization and queing of requests. We were thinking of breaking in more NAS device and breaking up user groups based on that – however again we end up with more equipment and management issues.
How are workstations, rendering servers and NAS boxes currently connected? Can you provide further details of your current infrastructure?
JD: All connections are via a gigabit Ethernet, we have Cisco catalyst switches. Our current storage server is connected via a dual fiber link to the switch. Users typically open 3D files which contain 100 or more linked texture and material files, they work on these files and will typically send them over to our renderfarm using a render manager. The render servers pick up the request and start the rendering process, depending on the scene it could take anywhere between 20 – 40 minutes for the frame to be created and written to disk. Each frame can be 2-3mb in size. We usually do multiple passes, that is 5+ frames make up 1 frame. 30 frames = 1 second of animation, so it adds up!
What are you finding attractive about NS20 Celerra (another NAS head)?
JD: Scalability mainly and the ability to easily manage storage and dynamically carve space depending on requirements, we will initially go with 8TB of usable space and add more drives trays as and when we need them in future. Also, the backup system for celerra with their snapshop option looks pretty impressive.
What is a typical day to day workflow that utilizes and strains the current infrastructure?
JD: As our 3D visualization and rendering is our core, our biggest issue has always been render capacity and storage capacity. We are in good shape with our render capacity about 20+ servers, our storage is the current issue, we can render more in a shorter time frame, also our HD frames are twice or thrice the size they used to be. So there is a constant pruning (which means we may delete files we really shouldn’t be!) and a trying to archive projects to tape the moment they are done only to bring them back on again when a client comes back in a few months with changes.

Tuesday, October 09, 2007

Design a Storage Solution for a Creative Media Firm

One thing, I enjoyed the most in last four years while writing this storage blog, is the interaction with professionals from diverse background discussing wide variety of issues. Time to time, I was approached by readers looking for new opportunities or to fill open positions in their organizations. No, I am not setting the stage for launching a classified service on my blog.

The reader communication that attract me most are the ones where I learn something new or I can extend help to others or introduce two people together. And, such contacts didn't stop despite my absence for past few months.

A product development executive at a creative media firm contacted me a month ago requesting advise on storage solution for his firm. Unfortunately, I wasn't able to communicate regularly after initial contact. In our last conversation, he agreed for me to post his initial email (redact identifying information) on my blog.

Can you advise on a suitable storage solution for him? What questions will you ask to get additional information? What suggestions will you make?
Dear Anil,

I was reading through storage solutions and came across your very informative blog. I am not sure if you can help me on this; we are looking to deploy a SAN solution in our company and since my knowledge on this topic is only building up now I am a bit shaky about going down this path.

Just a quick overview – we are a creative media company with about 40 users and growing fast, our work involves 3D visualization, video production and interactive media development targeting architecture and pharma companies. We are continually fighting a battle with our storage requirements, we have generally been adding a NAS box as we needed more and using the older NAS boxes for admin usage etc. Our main hiccup is storage and to a lesser degree speed. We don’t have any serious application servers, just workstations and render servers in our production pipeline. We are looking at the NS20 Celerra from EMC, but am a bit unsure of this path and it’s applicability in our environment.

Thanks

JD

Monday, October 01, 2007

Thanks for sticking around!

It has been quite a while since I last blogged. This summer had high-and-low stored for me at personal level. The summer started out great with family get together and celebration but ended with death in the family.

Blogging and Storage weren't something I though about much in last two months. As someone said, life goes on, it is time for me to move on from personal tragedy. Thank you everyone for inquiring about my absence from blogging and whereabouts. I hope to slowly ease back in to blogging again.

If you are reading this post, thanks for sticking around!

Friday, August 17, 2007

Just Say "Thanks for Feedback"

For almost a month, blogging and blog reading took a back seat to hosting friends and family from four different countries, playing golf and Wii (addictive!), rigors of the day job and enjoying the summertime.

I most probably will still be on blogging hiatus for few more weeks if not for the battle of the words among storage bloggers Robin Harris, Barry Burke, and others. I guess I got embroiled by making an observational comment on Robin's Blog post resulting in pen-lashing from Storagezilla.

Overall, I am very surprised with the emotional responses and the offense both bloggers and commentators are taking. Be nice, boys! There is some great feedback in these blog posts and comments for everyone to take home. Whether, like it or not, just saying thanks for the feedback, will go a long way.

After storage blogging for almost four years, I would be saying thanks for the feedback, if I was receiving such thoughts about my blog from someone. May be, I am not cut out for EMC DNA mutation. BTW, thanks Chuck for acknowledging the storage trenchtrolls.

This war of words also reminded me of an incident, I was involved in, about two years ago. At that time, Hu was a class act in responding to my criticism of his blogging. In retrospect, I was out-of-line. Despite this incident, I was invited and hosted at HDS Executive Briefing Center by Jeremiah, previously HDS community evangelist. For this very reason, I admire and have great respect for both Hu and Jeremiah.

My offer to storage bloggers, readers, industry insiders and outsiders remains same. If you ever visit Seattle area, get in touch. I will love to pick your brain over a beer/coffee/lunch/dinner. I will do the same when I am in your town, just send me your contact info.

Of course, any feedback through comments, email and phone is always welcome too.

Tuesday, July 24, 2007

Tales of Storage IPOs: What happened to Quality?

A comparison of upcoming IPOs, Netezza, BladeLogic, Voltaire and Compellent, with recent storage IPOs. If VMware is to be shown on this chart, it will be outside of upper-right quadrant.

Monday, July 16, 2007

Power Consumption by Google Services

Even though, Google doesn’t share a lot of details of their infrastructure, as we have seen from limited published information, they are obsessed with continuously monitoring, managing and improving the efficiency of their infrastructure.

Recently, Robin Harris attended the Google conference on scalability and then mused How Yahoo can beat Google. Few months ago, Google published results of their work on disk drive failure in paper Failure Trends in a Large Disk Drive Population [PDF]. It was extensively covered in blogosphere including by me in blog entries SMART not so smart in predicting disk drive failure and Google Findings of Disk Failures Rates and Implications and by Robin Harris in his blog entry Google’s Disk Failure Experience.

Google has done it again and presented results of their work on power consumption and provisioning in paper Power Provisioning for a Warehouse-sized Computer [PDF] at the ACM International Symposium on Computer Architecture, San Diego CA, June 9 – 13, 2007. In this work, Google researchers, Xiaobo Fan, Wolf-Dietrich Weber and Luiz Andre Barroso looked in to 15,000 servers running three different applications – Websearch, Webmail and Mapreduce for six months to determine the power usage characteristics at Rack, PDU and Datacenter levels.

Google Services

Websearch: A service with high request throughput and large data processing for each request.

Webmail: A disk I/O intensive service. Machines configured with large number of disk drives. Each request involves a relatively small number of servers.

Mapreduce: A cluster dedicated to running large offline batch jobs. Involve process terabytes of data using thousands of machines.

Key Findings

The key findings from this work are:
  1. The difference between maximum power used by large number of computing devices, cumulatively, and their theoretical peak usage can be as much as 40% in datacenters.

  2. It may be more efficient to leverage power management techniques at datacenter level than at rack level.

  3. Nameplate ratings are of little use in power provisioning as they significantly overestimate actual maximum usage.

  4. CPU utilization as a measure of machine-level activity produces accurate results for dynamic power usage especially with large group of machines. The dynamic power range is less than 30% for disks and negligible for motherboards.

  5. Using maximum power draw of individual machines to provision the datacenter, will have some stranded capacity.

  6. A mix of diverse workload reduces the difference between average and peak power, an argument in favor of mixed deployment.

  7. Idle power is significantly lower than the actual peak power, but generally never below 50%.

  8. CPU dynamic voltage/frequency scaling may yield moderate energy savings (up to 23%) at datacenter levels.

  9. Peak power consumption at the data center level could be reduced by 30% and energy usage could be halved if systems were designed so that lower activity levels meant correspondingly lower power usage profiles.
I believe there is a value proposition in this study waiting to be discovered by Copan Systems for their MAID technology to be applied at PDU and datacenter scales or even along the lines of SUN Project Blackbox.

More details from this study later.

Sunday, July 15, 2007

Tales of Storage IPOs: Hardware or Software and Components

Recently, an interesting question about the preferred packaging of the storage product by startups came up during my conversation with few private equity acquaintances. The opinions were kind of biased and fell along their past professional background.

The opinion from finance-focused guy was that most software products require less initial investment in infrastructure and working capital needs for distribution, service and support. The software startups tend to become cash flow positive and profitable lot more quickly with early market successes. The opinion from sales-focused guy was to have a product in physical form that a potential customer can visualize, feel and touch. A physical product tend to raise fewer investigative queries about the inner-workings and the inquisitive knob-turnings that tend to happen with software product.

I am of the opinion that irrespective of the storage product packaging, the differentiation tend to come, most times, from the software running under the covers or as a standalone product. One lesson, I learned from my last entrepreneurial storage adventure was that during early round of financing, the pressure for a "working" prototype is lot greater with a storage software product than hardware product.

I decided to further look in to the question of whether startups with physical storage product are perceived better by looking at the value placed on them in the marketplace. Unfortunately, the earliest available public financial data for startup is when it files for IPO registration.

This is an attempt in finding patterns for startups from the public financial data of recent IPOs by Isilon, Data Domain, Riverbed, Mallanox, Commvault and Double-Take. A visual representation of revenue, income/loss, market capitalization and change in market cap after 12 days of trading for these six recent IPOs is shown below.


Be mindful of the limitations and assumptions made to simplify the analysis such as few data points, number of outstanding shares remaining unchanged at IPO, market cap as indicator of value of company and equating price change with change in market cap over short period of trading after IPO, and adjusting revenue and income to make them consistent across all IPOs.

Some of my observations from this analysis are:

Storage software and component vendors were valued less at IPO compared to storage hardware despite being profitable and revenues at par or higher. During early trading, public markets didn't reward software and component vendors for better financial strength either. The gains in market cap during first 12 days of trading were significantly higher for hardware IPOs. Being profitable doesn't necessarily gets rewarded. The extent of profit/loss had little bearing on performance of storage IPOs. Maturity of business results in lower rewards. Double-Take was seriously undervalued and under-appreciated in all aspects.

Last Friday, Gary also wrote about the price performance of storage IPOs during past six months in his post IPO Class of 2006. He mentioned that Riverbed and Double-Take had highest percentage price gain, while Isilon is swooning and Commvault is in doldrums. May be the efficiency of the marketplace is being finally realized.

As I was using IPO data as a proxy to startup value, I decided not to extend the time period of analysis (may do anyway, now). The primary reason was to simplify the analysis and keep the data mining and normalization workload manageable - several vendors had secondary offerings, number of outstanding shares may have changed, temporary changes in financial data resulting from IPO event and proceeds, and lock-up expiration.

Share your thoughts and opinions on the success/failure of startups and storage IPOs.

Monday, July 09, 2007

3PAR: System Design. Part I

One of the core features of 3PAR, in my opinion, is System Design that facilitates scalability, availability, performance and ease of use. As posted in my previous blog entry, Craig Nunes mentioned two key attributes of their system design, scalability and clustered modular architecture.
… our customers have taken great advantage of the scalability of the array. The clustered, modular architecture eliminates the price premiums of the monolithic arrays and scaling complexity of modular array architectures.
Another reader mentioned two key benefits of 3PAR system design, the performance and ease of use, similar to what I heard from several users and evaluators of 3PAR at SNW. The data is spread across every spindle, up to 2,560 drives, and workload is spread across all active/active controller nodes. The LUNs are built from available raw disks with little need for pre-planning, pre-configuration of disk/parity groups or deciding on what disk spindles to use.
3PAR Utility Storage product brief also gives some insight in to the system. I know the brief is a marketing spin but I am working on getting more technical details from 3PAR.
Central to the design is a high bandwidth, low-latency backplane that unifies …, … modular and upgradeable components into a highly available, … automatically load-balanced cluster.

A full-mesh, … passive backplane provides a dedicated … data path between each and every 3PAR ASIC, one of which resides in every 3PAR Controller Node.
Physical disks … are divided into uniform 256-MB chunklets. Chunklets from across the system are then automatically selected and grouped to meet user-defined levels of performance, cost and availability (varying such parameters as RAID type, drive type, radical placement and stripe width).

Customers can non-disruptively alter a volume’s underlying RAID protection level, drive type, stripe width and/or radial placement.
In my next post, I will dig deeper in to 3PAR system design, hopefully with some help from 3PAR and their customers so that I don’t need to make up stuff.

Sunday, July 01, 2007

3PAR: The Follow-up

Sometime, a blog post touches the nerves and that was the case with my last blog post 3PAR: Reversal of Fortune. I heard from numerous readers including people from Bluearc and 3PAR. Craig Nunes, VP Marketing at 3PAR sent me a detailed email highlighting the reasons why customers are choosing 3PAR.
The success of our products, as you mentioned hearing from several customers, has a lot to do with ease of use. Those who have chosen 3PAR have been able to provision capacity in seconds without preplanning. They have been able to eliminate performance tuning activities because all volumes have access to all resources within this massively parallel system. They have also been able to move data from one storage tier to another or redistribute existing volumes across new resources as they are added to the system in a single command, completely non-disruptively.

That being said, ease of use is only part of it -- efficiency is a also a big part of why customers have deployed 3PAR in their data centers. As you mentioned, Thin Provisioning is getting a great deal of attention and is now moving into large-scale deployments within very conservative companies. But efficient local and remote copy capabilities -- both built on thin copy technology -- are another major reason customers have chosen 3PAR. Organizations have taken substantial advantage of these products, deploying hourly (writeable) snapshots for granular and rapid recovery, and cost-effective tier one remote data replication solutions without the need for vendor professional services.

Finally, our customers have taken great advantage of the scalability of the array. The clustered, modular architecture eliminates the price premiums of the monolithic arrays and scaling complexity of modular array architectures. Our customers start small, with as few as two controllers, yet grow non-disruptively and massively within a single, fully-tiered system. Organizations, especially those with high growth requirements, have chosen to grown within their 3PAR array instead of facing the complexity of managing 5 or 10 or more arrays from other vendors.
Another reader put things more succinctly.
Thin provisioning is great, and is definitely a killer feature, but it's not THE killer feature. I'd say it's one of four things that 3PAR does really well. The combination of those four things in one box, is what differentiates 3PAR from other vendors in the market.
The key claims, Craig made were:
  • Ease of use - Provision capacity without preplanning. Eliminate performance tuning.

  • Efficiency - Thin provisioning. Efficient local and remote copy capabilities.

  • Scalability of array - Clustered modular architecture. Grow non-disruptively and massively within a single fully-tiered system.
I don't have any first-hand experience with 3PAR product. Do you have any experience with them? What is your opinion about 3PAR product and technology? This month, I would like to further look in to 3PAR technology. Any documents, comments, emails and phone calls that help me understand 3PAR better are most welcome.

Monday, June 25, 2007

3PAR: Reversal of Fortune

Last weekend, while browsing my archive on the mobile drive, I came across a text file with interesting quote. I don't know who said this and where it was published. But the timestamp on text file shows August 2002.
3PAR won't make it as far as BlueArc did, but will mostly likely fail for many of the same reasons. A box is a box is a box...."What's the price per megabyte?"....This is unfortunate, but what customers really want and need is faster, better, cheaper storage WITH integration with all the other pieces of the SAN and applications. Building a bigger, faster box takes a while. Certifying and integrating useful applications, host support, switch support and the ENDLESS list of combinations with this and HBAs takes a LOT longer. Not to mention costly and engineering-intensive real-world performance testing to prove it really is better.

As with many things, if it's not faster, cheaper and better, there isn't much motivation for large customers to take the risk; especially in this climate.
Write a comment or send me an email if you know the origin of above quote. Update: Anonymous comment pointed to B&S Message Board as the source of the above quote.

Five years later, I don't know about BlueArc but 3PAR seems to be on its way to becoming a successful established subsystem vendor.

At SNW, I heard praise for 3PAR by several customers who were using 3PAR subsystems and also prospects who were in evaluation phase. What was surprising that not one of them mentioned much-hyped thin provisioning to be the primary reason for selecting 3PAR. All pointed out the 3PAR volume manager and striping of data across available disk resources being the primary reason with comments like "HP EVA like capabilities in 3PAR go far beyond EVA."

Both, 3PAR marketing at SNW and CEO David Scott in Byte & Switch, seem to be highlighting thin provisioning as the main reason for their success in highly competitive subsystem market with very conservative large enterprise customers. Is it really so?

What are your reasons for selecting or not considering 3PAR subsystem?

Wednesday, June 20, 2007

Gear6 trailblazing Network Caching

Earlier this week, I had great conversation with Gary Orenstien and Jack O’Brien at Gear6. Here are the excerpts from our conversation.

How is Gear6 doing?

Gear6 seems to be doing well. Several units are currently in field being evaluated by various customers. No specific number of units provided, just a wide range between 10 and 100. Company has over thirty employees and financially all set in the near term. Company has started to focus CACHEfx on financial analytics, energy and animation segments and will expand focus by the end of the year.

What are the benefits of network based caching?

Network caching enables increased cache utilization, flexibility and scalability. Caching is moving from end devices to network and becoming a network resource.

What one factor is attracting customers to your caching solution?

By nature of caching, the obvious benefit to customer is performance. Most customers who come to Gear6 have performance problems, variable workload and demand certain Quality of Service. The success rate is very good with evaluations by customers as CACHEfx appliance doesn’t require forklift replacement.

How is Gear6 doing caching?

CACHEfx appliance doesn’t use any conventional mechanical disk storage internally, 100% RAM cache and is pass through to persistent storage. Robust single purpose appliance designed to do one job and do that job very well.

The caching is performed intelligently. The intelligence focus on how and where data is placed within the appliance. There are extensive built-in statistics. Most customers are impressed by network sniffer like capability.

In the past, cache was a constrained resource. Now, focus is on right-sizing cache. CACHEfx expands from quarter TB to multi-TB, can be preloaded with data from persistent storage and adjust to variable I/O profile.

What are the reliability, availability and scalability features of CACHEfx appliance?

It is a clustered appliance, scalable from quarter TB to multi-TB. The appliance can be expanded on the fly. Also, appliance only acknowledges writes only when persistent storage sends acknowledgment.

Is the CACHEfx installed at D.E. Shaw working with Solaris cluster?

Gary declined to comment on infrastructure details of customer. He claimed customer pleased with the solution.

Any plans to introduce network caching for block-level traffic? The present product seems to focus on NFS only.

The present focus is on NFS, market is large enough. The sweet spot is where customer is using 100+ concurrent clients accessing single dataset, most tend to be NFS. No firm plans for addressing CIFS or block level traffic. The primary industry focus on financial analytic, energy and exploration, electronic design, animation, biotechnology, and media, primarily HPC oriented tasks.

How does network caching stack up with parallel file systems and clustered storage?

Caching addresses I/O constrained systems rather than processing constrained. Parallel file systems and clustered storage solutions are capacity centric not performance centric, providing global namespace for ever expanding storage capacity. They are not low latency solution. Network caching is a complementary solution, capacity complemented by performance. Gear6 solution complements Netapp OnTAP GX, IBRIX, Isilon and Acopia.

Do you have any thoughts on potential application of CACHEfx in a Wide Area Filer Network environment?

The CACHEfx has enormous potential in variety of environment. But we are currently very focused on solving customer problems within the data center. We are open to partnerships in other areas.

Sunday, June 17, 2007

Bountiful Bandwidth Lagging Latency

Recently, I came across an interesting article published in 2004 comparing growth, reasons and handling imbalance between bandwidth and latency. Excerpts below are from Latency Lags Bandwidth, Recognizing the chronic imbalance between bandwidth and latency, and how to cope with it. By David A. Patterson, Communications of the ACM, October 2004/Vol. 47, No. 10.
In the time that bandwidth doubles, latency improves by no more than a factor of 1.2 to 1.4.
Reasons for Bountiful Bandwidth
“There is an old network saying: Bandwidth problems can be cured with money. Latency problems are harder because the speed of light is fixed – you can’t bribe God” – Anonymous.

Moore’s Law helps bandwidth more than latency.
Distance limits latency.
Bandwidth is generally easier to sell.
Latency helps bandwidth.
Bandwidth hurts latency.
Operating system overhead hurts latency.
Coping with Lagging Latency
Caching: Leveraging capacity to help latency.
Replication: Leveraging capacity to again help latency.
Prediction: Leveraging bandwidth to again help latency.
Marketing Latency Innovations
The difficulty of marketing latency innovations is one of the reasons latency has received less attention thus far.

Perhaps, we can draw inspiration from the more mature automotive industry, which advertises time to accelerate from 0-to-60 miles per hour in addition to peak horsepower and top speed.

Tuesday, June 12, 2007

Where do you focus, Bandwidth or Latency?

Since my first post about Gear6, Gary Orenstein and I have been exchanging emails discussing various aspects of storage caching and Gear6. Recently, he commented in response to my request for pointers on storage caching market and implementations:
When I find interesting items related to caching I usually post on our blog. The thing is, there really hasn't been anyone promoting network-based caching until Gear6.
With rising interest in flash memory and SSDs, I am finding storage caching quite intriguing. I decided to start from basics.

What problems does caching solve?

The major benefit of caching is in reducing the latency whether caching is part of the web, network, file system, storage device, processor or memory. What is latency? Any delay in response to a request.

Bandwidth Bias

One consistent theme struck me odd as I started studying caching is how often we suggest more bandwidth as a solution to the slow performance issues and how little focus we give to the latency side of the problem. What is bandwidth? The amount of data carried from one point to another in a given time.

Even in iSCSI world, we all hear how 10GbE will be the inflection point, indirectly giving the impression that bandwidth is the bottleneck in iSCSI adoption. What is the real bottleneck in iSCSI? Is it bandwidth or latency?

I guess it sounds more impressive "With 10GbE, the bandwidth will increase 10X so you will be able to push ten times of data but latency will only be reduced in half (approx)."

From the productivity aspects of users and applications, a predictable and quick response to a request seems to be considerably more important than the amount of data being transferred over a specified period. What good more bandwidth does if data needs to wait for processing? A balance between bandwidth and latency need to be considered in designing solutions.

In the end, my impression is that most of us tend to focus too much on bandwidth and too little on latency.

Wednesday, June 06, 2007

Wikibon, The Improvements Needed

As I mentioned previously, Wikibon project is very interesting and schedule permitting, I plan to monitor its progress. I see the value of collective intelligence and bringing down the barriers in market research and industry analysis segment. If the approach succeeds, it will revolutionize this industry, the way Wikipedia did to Encyclopedia business.

The intent of this post is not to dismiss the initiative as another hype of social networking era. All new experiments go through a phase of trial-and-error before finding their footing and niche. I feel Wikibon is currently in that early phase where Dave and his team are trying various things to see what sticks, what not and what will make them realize their vision.

The objective of this post is to help them during this early phase by making two very specific suggestions for improvements.

Lead with Content

Overall, Wikibon started with a good web presence. Only design suggestion will be to lead the presence with content and cleaner interface otherwise it just take away the community and participatory feel of the initiative. Some annoyances:
  • Too many choices and information crammed in to home page.

  • Unnecessary and excessive use of text boxes, fonts with different colors and sizes and slide style boxes and graphics.
As Chris Evans commented and I agree that to gain any type of mindshare, Wikibon need to highlight the content not the people.

Do you really need Wiki format?

It is great to see 340 articles on variety of topics already posted on Wikibon. Most articles seem to be "independent" in nature, written by individual authors containing only their opinion with very little scope to modify content by others. I found content to be more fitting for blog format instead of wiki.

This is a typical challenge in most wiki projects. Is the topic and content conducive to modifications by others? If content is more conducive to be commented by readers instead of modification then it is more fitting in a blog format. I am sure you will also be able to see the content that is likely to be modified and added with new information and the content that is likely to receive comments.

Compare the Wikipedia Backup page with following Wikibon pages for Backup articles. It doesn’t take long to identify pages that are more likely to be modified or content added by someone other than the original author/creator.
Implementing fail proof backup and recovery
Backup and recovery options
Backup and recovery techniques
Sizing up backup and recovery options
Data de-duplication and the low-end backup/restore choice
Checkout the storage market prediction trading feature at Wikibon. It is an excellent feature that has potential to leverage the power and knowledge of the community.

Time permitting, I may review Wikibon further.

Tuesday, June 05, 2007

Wikibon, An experiment in Collective Intelligence

Few weeks ago, David Vellante contacted me about his new project, Wikibon and invited me to attend Peer Incite research meetings. Wikibon is a project where he is trying to harvest and share the collective intelligence of IT community for market research, industry analysis and insights. Having previously founded the storage research group at IDC, it was no surprise that Dave picked enterprise storage as the first industry segment to target with Wikibon project.

What piqued my interest?

Considering industry analyst world being a walled garden with entry only allowed to chosen few who can pay hefty entrance fee, Wikibon is an interesting experiment. Any cracks in garden walls are a welcoming change for Average Joe like me. But my interest in Wikibon extend beyond just an open source experiment in IT market research. I am more excited about the harvesting and sharing collective intelligence aspects of this 'public' experiment.

How beneficial will it be for an organization to make decisions based on this collective intelligence instead of listening to chosen few with loudest voice or political connections? Unfortunately, considering the competitive advantage such approach offers, few organizations who experimented with collective intelligence internally willing to share and discuss their methods and findings publicly. I believe that blogs and wikis are not just external facing marketing communication tools for enterprises. They also make excellent methods for harvesting the collective intelligence of everyone with in an organization especially those operating in knowledge-intensive industry.

Unfortunately, Dave couldn't make me get up early enough in the morning to attend a meeting at 9:00am ET (6:00am on my coast), later turned out to be just a typo and time zone confusion. Finally, this morning I attended the Peer Incite research meeting on Data De-duplication topic. Even though, this topic doesn't excite me anymore [More in a later post. I have moved on to other exciting and new topics.], the affiliation of vocal participants and dynamics among the participants was interesting to observe.

So, what is my impression and feedback on Wikibon project, community web presence and Peer Incite meeting?

As mentioned before, Wikibon project definitely has piqued my interest irrespective of reasons aligning with Dave's vision or not. I am planning to monitor its progress, share my opinions, and participate and report as time permits.

[Too late in the night] I will try to continue my feedback on this project in another post.

Sunday, June 03, 2007

Blogging Hiatus

Last couple of weeks, I was absent from any blogging due to back-to-back trips to Anchorage and Princeton. Unlike Storagezilla going off the grid, my blogging hiatus was unintentional and due to the demands of the day job and personal life. The highlights of trips were experiencing the scenic beauty of Alaska for the first time, opportunity to play 18 holes at Bunker Hill Golf Course, visiting Princeton University and talking to couple of very smart people.

Note to the readers: Blog posts will be sporadic from June through August due to demands of few other personal initiatives.


Saturday, May 19, 2007

SaaS Panel Discussion Recap

As mentioned before, last Wednesday I moderated a panel discussion on Software-as-a-Service (SaaS) for IIT-PNW at Google Kirkland campus, an amazing experience. Our panel guests represented broad spectrum of SaaS ecosystem and the audience liberally peppered them with questions on wide variety of topics.

Interestingly, only consensus we had between panelists and audience was that SaaS will grow further and will have significant impact on various business and consumer activities.
  • Defining SaaS. Web 2.0 vs. SaaS. Consumer vs. business focus. SaaS meant different things to different panelists and audience members.

  • Start small. Target small. Improve quickly and frequently. Generate demand quickly. Scale as you grow. Enable experimentation.

  • SaaS strengthens and grows further as web access becomes ubiquitous and available on various devices, specifically growth with internet access through mobile devices.

  • Migration from Software-as-a-Product (SaaP) to SaaS. Benefits of frequent feature enhancements and quick customer feedback. Concerns about accessing data and services offline. Migration from pure web complemented by desktop client option.

  • Most SaaS growth in application area and little in infrastructure area. But greater and quicker adoption in infrastructure area.

  • Main benefit of application SaaS in collaborative namespace. Main benefit of infrastructure SaaS in someone else responsible for muck.

  • Tools and platforms for SaaS development. Doubts about reaching a stage where operating system as SaaS in near future. Concerns on how the evolution in API access will impact the existing integrations in place.

  • Concerns about Security, scalability and vendor lock-in. High switching cost with infrastructure SaaS.

  • Business model - subscription vs. ad supported. Differentiation through service, experience and collaboration.
One of the panelist recounted how he sold his house from retouching images, creating flyer to final selling using only online tools. This reminded me of recent blog post by Phil Wainewright about how Appirio runs its business completely using on-demand integrated application infrastructure.

Panelists believed that SaaS margins will not be as high as SaaP and potentially declining, contrary to McKinsey's expectation of profitability improving as market grows.

Overall a great panel discussion event, thanks to great panelists and engaging audience.

Amazon Simple Storage Service (S3) is being considered a trailblazer and in an enviable position in infrastructure SaaS market. EMC/Unisys initiative shows product vendors caught off-guard with S3 growth.

How will SaaP vendors adopt to SaaS?

Monday, May 14, 2007

Cost vs. Benefit for Caching Appliances

Gary Orenstein at Gear6 sent me preview material with his press release for CACHEfx appliance launch.

I guess writing nice things about vendors once in a while has some benefits. May be, I will be crawling into doghouse, by the time Gary finishes reading this post. :-( But, somebody got to ask the hard questions.
Gear6 Unveils Industry’s First Terabyte-Scale Caching Appliances to Accelerate Data Intensive Applications

CACHEfx appliances support a baseline of 250,000 I/O Operations per Second (IOPS), 16 Gigabits per second of throughput, microsecond response time and scale linearly to handle millions of IOPS. The Reflex OS™ virtualizes appliance memory into a scalable coherent cache pool, optimizes data delivery through parallel I/O channels, and provides robust intelligent cache services.

CACHEfx centralized storage caching solutions are available now starting at $400,000.
My jaw dropped after looking at starting price tag of $400,000. My first reaction was "Damn, this thing is expensive!" The performance stats like 250K IOPS and 16Gbps throughput are impressive but let's be realistic how many customers can afford to pay $1.60 an IOPS to speed up applications, beyond hedge funds and stocks/options traders. I am looking forward to a ROI/TCO justification in near future.

How big is the market for caching appliances anyway? I hazard to guess that $1.6 an IOPS eliminates most web companies with intensive data access performance as well as a large portion of HPC market.

When and where can caching appliance threaten the parallel and clustered storage solutions?

Sunday, May 13, 2007

SaaS, Opportunity for Innovation

I will be moderating a panel discussion SaaS, Opportunity for Innovation at Google campus Kirkland, Wednesday, May 16th. The panel discussion is part of May monthly meeting of our alumni organization IIT-PNW. I am looking forward to facilitating a great discussion among our panel guests and audience. Our esteemed panel hails from wide spectrum of SaaS (Software-as-a-Service) ecosystem.
  • Russ Arun, General Manager, Windows Live Communications, Microsoft. His current role includes Mail, Messenger, Manageability, Storage and back-end services for Windows Live.
  • Charlie Bell, Vice President, Utility Computing, Amazon. His current role includes the Elastic Compute Cloud (EC2) and the Simple Storage Service (S3).
  • Kevin Marcus, Chief Technology Officer, Intelius. His current role includes technology ecosystem of a SaaS offering.
  • Peeyush Ranjan, Engineering Manager, Google. His current role includes web search related projects with prime area of interest in building large scalable systems.
Would you like to attend the event? Please send an RSVP to IIT-PNW President, Mohan Venkataramana at mohan_13 [at] msvi [dot] org. Please mention this blog post in your message. The admission to this event may be limited due to the capacity restrictions of facility and IIT-PNW's primary responsibility to IIT graduates.

Would you like to be the "mystery" panelist? If you are opinionated, hail from SaaS ecosystem, don't belong to organizations already represented on the panel, send me an email with your contact info, interest, background and introduction.

It is going to be a very interactive event, slide-free, prop-free, whiteboard may be. All panel guests will have 5 minutes to introduce SaaS (What, Why, Where, Who, How) from their viewpoints followed by 40 - 60 minutes of audience Q&A.

Do you have a question on SaaS topic for our panel guests? Send me an email. Please indicate if you like your name and affiliation to be withheld. All responses will be posted on this blog.

Our last panel discussion Mobile Advertising - Technical Challenges and Business Opportunities moderated and blogged by Chetan Sharma had overwhelming response and very engaging audience. Unfortunately, being at Storage Networking World (SNW) in San Diego, I missed this great evening.

Look for a recap of the event sometime later in the week.

Wednesday, May 09, 2007

SVW: Storewiz. What do I like? Compression.

There are three key factors (compression process, unpredictability of data reduction techniques, and allaying buying fear from startup) that are attractive for Storewiz. It doesn't appear that Storewiz is highlighting these factors publicly. I don't know, why? My cynical side cautions you to take these virtues with grain of salt. Of course, there is no doubt about the ultimate benefit with Storewiz appliance, the data footprint reduction on a storage device.

As you may realize (See previous posts, Storage Vendors to Watch: Storewiz. I and SVW: Storewiz. Q&A. and resulting comments), compression doesn't seem to get much love in storage industry with primary concerns being CPU utilization and performance impacts. How does Storewiz implement compression?

There is not enough information available from Storewiz on compression methodology and implementation. Most of the information below comes from Storewiz patent applications, specifically Method and System for Compression of Files for Storage and Operation on Compressed Files [US 2006/0184505 A1].
ABSTRACT. A method and system for creating, reading and writing compressed files for use with a file access storage. The compressed data of a raw file are packed into a plurality of compressed units and stored as compressed files. One or more corresponding compressed units may be read and/or updated with no need for restoring the entire file whilst maintaining de-fragmented structure of the compressed file.
The segments of an original file are sequentially compressed, by segment, into series of compression logical units (CLUs). The metadata for compressed section and corresponding CLUs are stored in a separate table.

Reading data stored in a compressed file requires identifying relevant compressed segment then CLUs belonging to that segment. Then, applicable CLUs are restored until all data that need to be read is restored.

Updating data stored in a compressed file follows the similar process as read. But, it involves a little more complexity as number of CLUs that need to be written after update can change from original number of CLUs restored.

Based on the patent document, the uniqueness in Storewiz compression implementation probably comes from:
  • Random access to data in compressed stored files
  • Operations on the compressed data without decompressing entire file
  • Compression/decompression operations transparent to users
  • User unawareness of the storage location of the compressed data
The compression in an appliance approach is the easiest, quickest and very flexible one to follow from initial product development and adoption perspective. Once, this method for block devices and communication matures, I don't see anything preventing from merging the functionality into storage and networking hardware as SBCs.

Monday, May 07, 2007

SVW: Storewiz. Q&A.

Continuing from Storage Vendors to Watch: Storewiz. I

First the disclaimer for those with fertile imagination, I don't speak for Storewiz. Most information was obtained through public sources along with discussions at SNW with executives and others.

Let's address questions from Storagezilla, before I continue with my thoughts on Storewiz. He objected to the phrase "with better predictability than data de-duplication product."
What struck me about that is just like de-dup the data you're working with will dictate what savings you'll get. Image files or movies? Damn all or close to damn all. 1:1. … Databases or text files? Hell, you could get 5:1 compression, perhaps even more.
The data type is only one factor that impacts savings from data de-duplication. The others being "duplicity" of data in the dataset targeted for data de-duplication, "duplicity" of the dataset in stored data, targeted saving type, internal dedupe design and the implementation of data de-duplication solution in end-user environment.

Real life data de-duplication ratios vary a lot from 2:1 to 100:1. Data type on its own is not enough to be able to predict with certainty the achievable data reduction with data de-duplication. Beyond data type, data duplicity and the variations in duplicity over time is the main reason for a wide range in data reduction.

Why, when and where, "predictability" of data footprint reduction target matters more than the "highest achievable" data footprint reduction? Let's hear your thoughts first!
I've known about StoreWiz for a while now but I've always wondered where the FC/iSCSI compression boxes were?
Based on what I was told by Storewiz executives at SNW, the expected release is Q3/Q4, 2007. I am as usual skeptical of "Q3/Q4" claims like most people who have some experience dealing with any storage vendor. Most vendors say Q3/Q4 when asked in Q1/Q2 for time frame of next or new release. So, I wait too!
The sheer computational grunt required for such compression is an issue, …
I wrote in the last post, The strategy is similar to hardware compression in storage devices but with a twist that makes Storewiz implementation very resource efficient. Further explanation when I discuss three things that make Storewiz stand out in the data reduction market. As usual my opinion only.

All opinions expressed on this blog are my own, whether sink or swim. Your dissents, corrections and attempts to influence my opinions are always welcome.

Thursday, May 03, 2007

Storage Vendors to Watch: Storewiz. I

At SNW, I enjoyed briefings from two vendors the most. The enthusiasm of IBRIX executives for their product was contagious. And, simplicity of Storewiz product had me jump at the first opportunity to meet with company executives, JF Van Kerckhove and Jon Ash.

Storewiz product fits into two of the three themes, I observed at SNW, Global data reduction and Special-purpose appliances. It is a single-purpose appliance that helps you reduce the data footprint on storage devices with better predictability than what you get from data de-duplication products. Storewiz product provides real time on the fly compression, transparent to end user, easily added in front of the existing storage devices, and in some cases may improve performance.

What! that was my reaction when I first heard the performance improvement claims. Over the years, we all have been programmed to believe that compression slows things down and takes too many CPU cycles. My first reaction to performance improvement claims was no way! compression is an overhead, most probably slowing everything down. What is the first question that pops in to your mind when you hear compression?

The compression/decompression activity is performed by CPUs in Storewiz appliance eliminating the need to run compression process on storage devices or hosts. The strategy is similar to hardware compression in storage devices but with a twist (explained later in another post) that makes Storewiz implementation very resource efficient.

Storewiz execs also claimed that their appliance doesn't increase latency by more than 50 - 100 microseconds. I believe as long as the latency caused by compression process doesn't exceed the net decrease in time to write "compressed" data to the disks, you could potentially see the performance improvements during write operations. Same is true for reading the data from disk and decompression process which is also enhanced by read cache. Storewiz patented resource efficient compression implementation also reduces the need to compress/decompress too much data.

Checkout the image below showing NetApp read/write performance and CPU utilization with and without Storewiz appliance listed in one of their case study.

To be continued …

Monday, April 30, 2007

Storage Vendors to Watch: Cleversafe

The second company with buzz at SNW, but not present at the show, wasn't a surprise unlike Gear6, a storage vendor to watch, in my opinion. As Clark wrote earlier, Cleversafe approach of security through obscurity was being considered a shift from traditional approach of encryption where encryption keys are single point of failure for data that need to be stored for a reasonable length of time. As previously mentioned, Cleversafe is one of my favorite new company also.

Paraphrasing below from Cleversafe patent application [11/241,555], Digital Data Storage System, the concept is simply to provide security through information dispersal and integrity through replication and hashing.
A distributed storage system for storing slices of original data on multiple storage devices in one or more locations. The individual data slices on each storage device are unrecognizable and unusable except when combined with data slices from other storage devices. The data slices are selected by information dispersal algorithms so that even if there is a failure of one or more storage devices, the original data can be reconstructed.




The ah ha! moment for me was when an end-user in a session at SNW asked a speaker for the opinion on Cleversafe grid strategy. Earlier, the same end-user had pointed out to me that he has been evaluating solutions based on clustered and distributed file systems. It also reminded me of last startup, I was involved with, where we were trying to utilize unused storage on untrusted and unreliable nodes within an enterprise. Our vision was more along the lines of FarSite than Cleversafe. We often encountered two questions that are primarily addressed by Cleversafe approach.
  1. How will you make sure that data stored on untrusted nodes can not be accessed directly by users at that node?

  2. How will you make sure that data stored on unreliable nodes is available even though one or more nodes may be offline.
I agree with Clark on the clever strategy adopted by Cleversafe to open source the code and look for revenue from service and support. I just can't visualize Cleversafe as a stand-alone product in an enterprise, more like a component of a larger grid based storage service or solution:
  • Leverage the inherent data protection available with distributed storage. Why should data first be pushed to a central location in the name of physical consolidation and then pushed out to duplicate in the name of business continuity?

  • Leverage the performance scalability with simultaneous transfer from multiple nodes. Why should data be stored on one node and restricted by the bandwidth and performance available at one node when it can be striped across multiple nodes to enable simultaneous transfer?

  • Trend of location proximity of data with user. Why should data be anchored at one place when user is becoming more mobile?

Tuesday, April 24, 2007

Storage Vendors to Watch: Gear6

Last October, Gear6 came to my attention. But it didn't really hit home until a well-informed end-user talked to me about them at SNW. He was very excited about Gear6 product to address NFS performance issue with transaction databases. Based on what I understand, the product is a caching appliance, conceptually very easy to understand.

Excerpts from Gear6 website:
… keeping frequently accessed data in a very large central memory pool … This enables high performance data access by avoiding time-consuming disk operations and accelerates applications due to dramatically decreased response times and increased data throughput.

This innovative approach complements existing NAS/NFS deployments and installs transparently in the data center without requiring changes to current applications or infrastructure.
Actually, it was quite amusing at the conference. Most probably, I steered few end-users to Gear6 by suggesting to check out G6 caching appliance. These end users told me that they are using Oracle databases with NFS and performance being one of their pain points. I found three simple questions that can quickly tell whether someone may want to investigate G6 product.
  1. Are you using transaction databases?

  2. Do you use NFS mounts?

  3. Do you have performance issues?
G6 product seems to be one of those products that require 10 minutes for presentation, 20 minutes for answering follow-up questions, 30 minutes for demo and then the question When do you want a unit for evaluation? Following is a clean version of a picture with G6, we drew at SNW. Is it feasible?
I will categorize Gear6 caching appliance as product that does only one thing but does it very well. With singular focus on NFS performance, Gear6 also made a good choice to attend Collaborate 07 an Oracle Community Event.

BTW, you may want to hop over to Thoughtput blog maintained by Gary Orenstein at Gear6 for more caching related information. Also check out these presentations from Gear6.





Share your thoughts on Gear6 and its caching appliance approach.

Monday, April 23, 2007

Storage Vendors to Watch: View from SNW

Background

I had the pleasure of conversing with numerous end users, small to large, during three days of SNW conference. Typically, I found end-users to be more willing to discuss different topics and express opinions. I am not sure why vendors rarely can have an open and fun conversation. They tend to become clam.

Most end-users request anonymity and rightly so considering the prior troubles of one end user, I heard at SNW. An unscrupulous storage journalist published an overheard conversation with name of the end user, without permission, resulting in legal and HR issues for this person. Anyway, no name of end users on my blog. I met some great people at SNW with whom I hope to stay in touch for a long time to come.

I received vendor briefing from IBRIX, Axeda, Storewiz, Njini, Asempra, STORServer, VMware and Falconstor. Thank you to the executives from these companies to come and talk to me. I hope to provide digital ink in some form, favorable or unfavorable, to them in near future.

I also had interesting conversations for couple of hours with guys involved in M&A (Merger & Acquisitions) scouting storage companies at SNW. The experience with raising private equity for my last startup made discussing the prospects of various storage companies a fun exercise.

Most information in Storage Vendors to Watch series comes from my conversations at SNW.

Companies that were not present at SNW

Interestingly to start this series, I want to talk about two companies that were not present at SNW but were part of conversation at SNW. The reason, I am excited about these companies because all the initial information came from end-users currently investigating or evaluating their products. Hopefully, the end-users were not plugged in to SNW by these companies to disseminate positive information.

To be continued …

Sunday, April 22, 2007

Themes of SNW Spring 2007

It was a great experience to attend Storage Networking World in San Diego. Thank you, Bill Wrinn and ComputerWorld for allowing me to attend the conference with media credentials. Hopefully, blogger experiment was as interesting to organizers as to me.

Overall, I felt there were three main themes to the show:

Global data reduction: It was apparent at SNW that users are starting to look at products that can reduce the data footprint on all type of storage across enterprise. There is strong interest in de-duplication, compression or any other single instancing method. End-user Town Hall meeting is an exclusive affair. Data reduction may have been one of the topics brought up in the meeting!

On the topic of End-user Town Hall meeting, are you guys really think that taking your grievances from such closed meetings to vendors through SNIA helps? Security weaknesses in products didn't get addressed by vendors until they were made public.

Special-purpose appliances: End-users seem to be more interested in products that can do one thing very well than try to solve all their problems half-ass. Whether they are unification, performance, security or vertical-specific solutions, end-users didn't seem too concerned about having special-purpose appliances in their data center. Introducing delays in data path is still a concern but it is no longer a hurdle as long as a major pain point is resolved for the customer.

Another related mindset change was end-users willingness to buy from startups. In my discussions with end-users except for few very conservative shops, it wasn't uncommon for their data centers to already have or under evaluation products from startups. This is a welcoming change from couple of years ago when end-users were only willing to buy from large established vendors. Such conservative attitudes do nothing more than slowing innovation and creating barriers for real solutions to alleviate the pain points.

Clustered and grid storage: This is a shift from earlier trend of scale-up. No longer end-users are demanding a larger capacity and higher performing version of a product. Instead they are looking for product that can scale out as their requirements change without a need for replacing existing solution. Performance scalability through clustered storage is no longer confined to high performance computing (HPC) market. It started making significant inroads with vertical-specific applications and now moving in to main stream for resource intensive applications like data mining.

The success of Google with scale-out architecture using commodity hardware is also making end-users wonder why can't they leverage the same in their environment leading to increased interest in grid storage that can adapt to changing workloads, dynamic environment and provide single view of storage.

There could be another theme with data classification at SNW. But I didn't see anything that stood out, same old policy-driven stuff, nothing intelligent, with limited end-user interest.

Of course, you may disagree with the above themes so let's hear your thoughts. An IBM person with whom I shared cab to the airport thought the main theme was Storage Virtualization.

Go figure!

Thursday, April 19, 2007

Readers Sentiment on Storage Blogging

Mark wrote a retort to ex-bloggers who quit because of lack of readers, as I mentioned in my previous blog post after talking to few ex-bloggers at SNW.

In my opinion, most storage ex-bloggers tried to write for somebody else rather than for themselves. What they failed to do was to treat the readers, however few, as peers and build relationship. They just wanted to sell or show off something to their readers.

I firmly believe page hits, clicks, links and the number of readers don't count for much. What matters most is the level of interaction with readers, not only online through blog comments but also offline through emails, phone calls and face to face encounters. Differences of opinion doesn't matter, take solace in the fact that at least readers felt comfortable enough to share their opinions and express differences.

Even though directed at companies, The Cluetrain Manifesto gives advise that every storage blogger should heed:
Can you put yourself out there: say what you think in your voice, present who you really are, show what you really care about? Do you have any genuine passion to share? Can you deal with such honesty? Such exposure?
The reader sentiment, in my small sampling at SNW, about storage bloggers was very consistent. The readers expressed liking storage bloggers with personality and have one or more qualities of being opinionated, straight-shooters, analytical, passionate and even somewhat of nut-case. The blog readers in storage industry and in end-user storage community subscribe to most known storage blogs but they related to very few of them.

Actually, no corporate storage blog made the grade on building relationship with the readers, especially the readers from end-user community. The reasons for skepticism were quite varied like only covering topics aligned with the interest of their companies to when do these supposedly busy executives find time to write blog posts. Most readers wouldn't be able to differentiate most blog posts from the company marketing collateral if both were presented in same format. Some even assumed that topics and majority of content for corporate storage blogs may be generated by ghostwriters and marketing groups.

The data storage professionals who work in end-user environment also felt uncomfortable blogging about storage. The reluctance was particularly strong with ones who are involved in evaluation and design of storage solutions. They felt that the topics of interest to them like architecture, design, performance and problems are typically protected by vendors confidentiality agreement as well as fear of potential reprisal from storage vendors and their own management. My suggestion to end-users was to blog anonymously, as guest writer on other storage blogs or just feed the sources they trust, whether analysts, journalists or bloggers doesn't matter.

Hearing end-users at SNW unsolicited praising IBRIX, Isilon, 3Par, Gear6, and Acopia while thrashing products like SVC from established vendors, my advise to storage startups is to start blogging and also facilitate end-users to blog that are testing and using your product.

Tuesday, April 17, 2007

SNW Day Two 10:00am - Bloggers Evening Recap

SNW Day One was really busy from early morning to late in the night. For me, the highlight of SNW Day One was the Bloggers Evening. The turn out at the evening was better than expected.

I sent out a reminder early afternoon to meet outside the hotel lobby bar at 6:00pm. Bruce Moxon, Blake Golliher, Clark Hodge, Claude Lorenson, SW Worth (both starting a Microsoft Stoarge blog) and myself initially met outside the lobby bar. We talked for half-an-hour standing outside the bar about blogging and storage.

Blogging concerns expressed during discussions were about making sure the information posted on the blog doesn't run afoul with employers as well as respecting the NDAs and confidential information of employer and suppliers. As we had a mix of storage bloggers representing vendors, services and end user, the blogging concerns were very varied. Past couple of days, I met quite a few ex-bloggers who were quickly discouraged not by their employers or any external entities but from absence of readers resulting in loss of blogging enthusiasm.

Unfortunately, we missed Marc Farley as I didn't get a chance to check email before Bloggers Evening and he had replied to my reminder mentioning being delayed for few minutes.

In the end, Bruce, Blake, Clark and myself went out for dinner to a nearby Italian restaurant. We had spirited discussion mostly about storage and blogging. We talked about how posts from opinionated and analytical bloggers make for a better read than people who just want to write nice things, marketing spiel or creating a very polished post. We also got in to discussion about our favorite upcoming storage companies, favorite conferences and events, favorite technologies and how changing storage landscape may impact few existing storage players.

Afterward, Blake and I also got in to discussion at hotel bar about social networking and its migration in to enterprise, our careers, interests, storage landscape and what's next for us.

Overall, Bloggers Evening turned out to be a great get-together. I echo the statement expressed by Bruce and others.
When is the next bloggers get-together?
I hope next Bloggers Evening is sooner than later.

Monday, April 16, 2007

SNW Day One 9:00am

SNIA introduces new Executive Director

Last night, I attended SNIA event at SNW to introduce SNIA's new Executive Director Leo Leger. I got to meet some new faces and some old.

It was great to run into an old acquaintance, Laurence Whittaker from my days at Toronto Storage Networking User Group (TSNUG). He is as usual very active with SNIA End User Council. We didn't get to talk about the EUC strategy planning over the weekend as he got dragged of by Wendy. Hopefully, I will run in to him again during the show and get some information on EUC plans for the next year.

I talked to Leo Leger, the new SNIA Executive Director. The talk was quick and short kind of reminded me of being at speed networking event! I also sat down with Arun and others from Patni Computers. They seem to be active in providing software services to storage companies.

I also met Scott Kipp from office of CTO at Brocade/McDATA. He wrote several books including Fibre Channel Advances and Broadband Entertainment: Digital Audio, Video and Gaming in Your Home. He shared challenges and discouraging results from his past attempts at blogging. He is again planning to start a blog. I invited him to Bloggers Evening and talked to other bloggers about their experiences and feedback on his challenges. Unfortunately, he is not able to join us for Bloggers Evening due to conflict with SNW Speakers Dinner.

The highlight of the evening was talking to Vincent Franceschini. Even though, he doesn't post regularly, I always enjoyed reading Vincent's blog as he focuses on emerging technologies at HDS. Personally being interested in technologies ready to cross the chasm from research to industry, we had lively discussion on the challenges and opportunities of Grid Storage and Service-oriented Architecture. It was surprising that he had thoughts on the role of memory prediction in grid storage.

Talking to Vincent felt like sitting in a room brainstorming ideas. He is definitely passionate and opinionated about emerging technologies, the traits I admire personally. He mentioned enjoying blogging but also explained time constraints with current responsibilities both at HDS and SNIA as well the challenges of talking about future trends and emerging technologies that may be construed as pie-in-the-sky by some. Hopefully, he will take me up on my offer to work with him on blogging about emerging technologies as well as with Grid Storage initiatives.

SNW Solutions Lab

Last night, I also walked around SNW Solutions Lab where people were working hard to make everything operate properly. It reminded me of my involvement with SNIA SNW Interoperability Lab six years ago. I never got to see the fruits of the labor that time but from the planning chaos of 2001, this Solutions Lab seems lot more organized.

Sunday, April 15, 2007

SNW Day Zero 6:50pm

Storage can't compete with Aircraft Carrier

I arrived in sunny San Diego at 3:00pm to attend Storage Networking World Spring 2007. On my way from airport to Manchester Grand Hyatt hotel, the site of SNW, I passed by lots of boats and a huge Aircraft Carrier. And I couldn't resist the allure of aircraft carrier which overshadowed anything storage had to offer. So the first thing I did after checking in to hotel was to take a walk to fascinating USS Midway and the USS San Diego monument.



SNW before the Opening

After enjoying the sunny Aircraft Carrier, I checked out the different locations where SNW events will be held and took a peek in the expo hall before opening day. Looks so quiet but I am sure tomorrow, we wouldn’t be saying same.



Microsoft Water

While navigating around the huge equipments crates of various vendors, I ran in to none other than Darrell Kleckley and the Microsoft Storage team. I first met him when he was on SNIA Education Committee and now he is a Technical Evangelist at Microsoft.

I hope Microsoft sent cases of Windows Vista water to SNW. If Donald Trump can sell water then why not Bill Gates too! And if you listen to Paul Graham, water may be the next market Microsoft needs, to make itself feared again.


Bachelorette Party

It is just my luck that I get to stay in the room right across a bachelorette party room.

Saturday, April 14, 2007

Can we meet at SNW?

Sunday 4/15 noon, Leaving Seattle for San Diego.
Monday 4/16 evening, At Bloggers Evening.
Tuesday 4/17 from 10:00am to 7:00pm, In one-on-one meetings.
Wednesday 4/18 evening, Leave San Diego and return to Seattle.

Rest of the schedule is wide open. And, I would like to meet as many of my blog readers as possible.

If you are going to be at SNW and your schedule permits, let's meet. Just send me an email or call me with day, time and location where you would like to meet.

Tuesday, April 10, 2007

Bloggers in Demand at SNW

Clarification: Bloggers Evening is Monday, April 16th at 6:00PM in Hotel Lobby. The JPR Cocktail event is Tuesday and totally separate from Bloggers Evening. I just wanted to clarify this as several people inquired.

I am amazed at the amount of attention being bestowed to storage bloggers at SNW by vendors, analysts and PR firms. I have received numerous emails and phone calls since my first post in which I mentioned plans to attend SNW as Storage Blogger.

Do you maintain a storage blog and would like to cover SNW events as blogger? Please send me an email or get in touch with Bill Winn at Topaz Partners who is managing media credentials for storage bloggers.

Several storage industry executives who presently don't blog requested to attend the Bloggers Evening. Clark Hodge and I discussed the issue of non-bloggers attending Bloggers Evening. We have decided to open up the Bloggers Evening to everyone.

Scott Kline from JPR Communications wrote a comment to my previous post and also sent me an email with official invitation for all Bloggers to JPR Cocktail Party.
I also want to have all of the Bloggers and associates of the Bloggers join us for cocktails and appetizers on Tuesday night from 7-9pm at the main lobby bar. I have attached the official invitation for you and whomever else wants to have some free drinks and food with all of the editors, analysts and companies.

Sunday, April 08, 2007

Bloggers Evening at SNW

Response exceeding Expectations

Response to my call for organizing an evening with Bloggers at SNW has exceeded my expectations. Mario Apicella aptly summed up my feeling about this initiative.
Only two years ago you could have counted storage bloggers without taking your socks off and now we can put together a small crowd.
Initially, I thought this evening may turn out to be party of one or two. But after a week of responses, I wish we should have tried organizing bloggers evening earlier.

In addition to last blog post, Say Hello at SNW, I reached out to storage bloggers through comments on their blog and email messages. I will continue to reach out to more bloggers next week. If you don't hear from me, it may just indicate that I am not aware of your blog. Please don't assume that you are not invited, just send me an email, leave a comment or pick up the phone and call me. All bloggers are welcome to Bloggers evening.

Thank you Storagezilla, Tony Pearson, Josh Maher, and Mario Apicella for spreading the word through post on your blogs.

A Party of Six

As of Saturday night, Bloggers evening is no longer a party of one but SIX. Following bloggers have expressed interest in attending Bloggers evening.

Clark Hodge, Storageswitched! Blog
Claude Lorenson, starting a storage blog at Microsoft
Marc Farley, Equallogic Blog
Tony Pearson, IBM Inside System Storage Blog
Jon Toigo, Drunken Data Blog

Even few bloggers, who are not attending Bloggers evening, mentioned the reason being absent all together at SNW. They also have very encouraging words for Bloggers evening.

"I'm afraid I can't make it to this SNW. … I love the idea of a bloggers get-together." Dave Hitz, Netapp

"it's a great idea but I am not going to SNW this time and this is one more reason to regret it :>)" Mario Apicella, Infoworld

"It is a great idea and thanks for the invitation!" Mike Linnett, Zerowait

When and Where

I am not familiar with SNW host Hotel, surrounding area and attendees interest in evening events at SNW. I am tentatively proposing we meet Monday, April 16th at 6:00PM in the lobby of Hotel Manchester Grand Hyatt and head out to a bar/restaurant at hotel or a location close by. Any alternate suggestions are very welcome.

Agenda

There is no specific agenda for Bloggers evening. Most likely, our discussion will revolve around storage blogging and data storage industry. If you would like to discuss any specific topic, please leave a comment, send me a message or just raise your topic at Bloggers evening.

Neither Fee Nor Free

There is no fee for attending Bloggers Evening, just bring your passion for blogging and data storage. But do bring your credit card, cash, food stamps, guns or any other method you use to pay for your own drinks and food. Unfortunately, there is no financial sponsor to cover the cost at Bloggers Evening.

Even though, Jeremiah will not be at Bloggers Evening, he has offered to buy us some drinks. Thank you! for the offer, Jeremiah. Anyone from PodTech is welcome to join us at Bloggers Evening. Anybody else who wants us to get drunk and stuffed, you are welcome to join us and pick up the tab.