> Quoth Steven Sim on Thu, May 17, 2007 at 09:55:37AM > +0800: > > Gurus; > > I am exceedingly impressed by the ZFS although > it is my humble opinion > > that Sun is not doing enough evangelizing for > it. > > What else do you think we should be doing? > > > David
I'll jump in here. I am a huge fan of ZFS. At the same time, I know about some of its warts. ZFS hints at adding agility to data management and is a wonderful system. At the same time, it operates on some assumptions which are antithetical to data agility, including: * inability to online restripe: add/remove data/parity disks * inability to make effective use of varying sized disks In one breath ZFS says, "Look how well you can dynamically alter filesystem storage." In another breath ZFS says, "Make sure that your pools have identical spindles and you have accurately predicted future bandwidth, access time, vdev size, and parity disks. Because you can't change any of that later." I know, down the road you can tack new vdevs onto the pool, but that really misses the point. Even so, if I accidentally add a vdev to a pool and then realize my mistake, I am sunk. Once a vdev is added to a pool, it is attached to the pool forever. Ideally I could provision a vdev, later decide that I need a disk/LUN from that vdev and simply remove the disk/LUN, decreasing the vdev capacity. I should have the ability to decide that current redundancy needs are insufficient and allocate [b]any[/b] number of new parity disks. I should be able to have a pool from a rack of 15x250GB disks and then later add a rack of 11x750GB disks [b]to the vdev[/b], not by making another vdev. I should have the luxury of deciding to put hot Oracle indexes on their own vdev, deallocate spindles form an existing vdev and put those indexes on the new vdev. I should be able to change my mind later and put it all back. Most importantly is the access time issue. Since there are no partial-stripe reads in ZFS, then access time for a RAIDZ vdev is the same as single-disk access time, no matter how wide the stripe is. How to evangelize better? Get rid of the glaring "you can't change it later" problems. Another thought is that flash storage has all of the indicators of being a disruptive technology described in [i]The Innovator's Dilemma[/i]. What this means is that flash storage [b]will[/b] take over hard disks. It is inevitable. ZFS has a weakness with access times but handles single-block corruption very nicely. ZFS also has the ability to do very wide RAIDZ stripes, up to 256(?) devices, providing mind-numbing throughput. Flash has near-zero access times and relatively low throughput. Flash is also prone to single-block failures once the erase-limit has been reached for a block. ZFS + Flash = near-zero access time, very high throughput and high data integrity. To answer the question: get rid of the limitations and build a Thumper-like device using flash. Market it for Oracle redo logs, temp space, swap space (flash is now cheaper than RAM), anything that needs massive throughput and ridiculous iops numbers, but not necessarily huge storage. Each month, the cost of flash will fall 4% anyway, so get ahead of the curve now. My 2 cents, at least. Marty This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss