Pages

Thursday, April 29, 2010

Why does CORE fail? Part 1 - Response

Steve Kenniston of Storwize made detailed comment in response to my last post Why does CORE fail? Part 1. I thought my response to his comment deserved a separate blog post. Frankly, I haven't kept up with developments at Storwize since May 2007 when I last wrote a series of blog posts on Storewiz so I don't claim any knowledge of current Storwize solution.
First, I am not so sure that time to 'uncompress' ... is a valid parameter IF all solutions are being compared identically,....
The time to decompress/reconstitution is as much important, if not more, than time to compress/dedupe. The compression/deduplication can be managed 'internally' to keep up with write expectations of applications and users whether through delaying writes just enough to allow data reduction in-band or through data reduction after writes complete or some hybrid approach. But, the read expectations must be met in-band so any decompression/reconstitution need to take place correctly and completely in the expected time. A solution that requires lower time to decompress should be rewarded in same fashion as a solution with lower time to compress being rewarded in CORE.
... First I think we can all agree that decompression or rehydration is faster than optimization (compression, deduplication). ... the performance of time to 'compress' (I prefer optimize) and then cut the time in half and call this time to rehydrate. Now apply the formula. I would assume that the new CORE value would come out very close as they are now.
I am not so sure of time to decompress/reconstitute being faster than time to compress/dedupe or being 50% of time to compress/dedupe as I haven't heard of a solution or seen data yet that supports such claim. Actually, the relationship may be reverse specially for solutions with large amount of compressed/deduped data and high data reduction ratio. Only related published data, I am aware of, is that of read speed being direct function of the smallest unit used for decompression/reconstitution - larger the unit size, higher the read speed.

As I questioned in my last post, are time to decompress and compress proxy for time to read and write from data reduction solution? If it is the case, CORE could be improved upon by including actual time to read and write (instead of time to decompress or compress) or including time to decompress/compress as penalty over normal read/write with a solution that has no data reduction technology - in essence, additional cost in the form of lower read/write performance in exchange for higher storage efficiency.
Also, without understanding how the solution works it is very difficult to debate the merits of the value of performance on that solution. ...
If CORE stays with the parameters that can be judged externally for a solution, it will be more relevant and valuable than trying to incorporate parameters internal to a solution like time to compress (tc). A CORE based on externally measured parameters like reduction ratio, read and write performance, and cost of solution over a range of storage capacity and time may produce a better value indicator. Any attempt to include internal mechanisms weakens the CORE due to lack of complete information and understanding of every solution and rapid changes in technology and techniques incorporated in such solutions.
How can you possibly say that a post process solution that has users: 1) Buy full storage capacity (vs. less capacity with an inline solution) ...... is a good solution? ...
Please read my post again. I never claim any one solution is better than other. CORE includes cost of solution as a parameter which supposedly should penalize the solution that includes more storage than required by other solutions.
Step out of the vendor shoes for a moment and put yourself in the shoes of the customer. Which would you want?
As a customer, I want a solution that will provide additional storage efficiency at reasonable cost while meeting my expectations for read and write performance, safeguards my data and doesn't require additional management overhead. Anything beyond that is vendor coloring the customer expectations to fit it's solution.

No comments:

Post a Comment