>>>>> "sl" == Scott Lawson <scott.law...@manukau.ac.nz> writes:
sl> Electricity *is* the lifeblood of available storage. I never meant to suggest computing machinery could run without electricity. My suggestion is, if your focus is _reliability_ rather than availability, meaning you don't want to lose the contents of a pool, you should think about what happens when power goes out, not just how to make sure power Never goes out Ever Absolutely because we Paid and our power is PERFECT. * pools should not go corrupt when power goes out. * UPS does not replace need for NVRAM's to have batteries in it because there are things between the UPS and the NVRAM like cords and power supplies, and the UPS themselves are not reliable enough if you have only one, and the controller containing the NVRAM may need to be hard-booted because of bugs. * supplying superexpensive futuristic infalliable fancypower to all disk shelves does not mean the SYNC CACHE command can be thrown out. maybe the power is still not infalliable, or maybe there will be SAN outages or blown controllers or shelves with junky software in them that hang the whole array when one drive goes bad. If you really care about availability: * reliability crosses into availability if you are planning to have fragile pools backed by a single SAN LUN, which may become corrupt if they lose power. Maybe you're planning to destroy the pool and restore from backup in that case, and you have some carefully-planned offsite backup heirarchy that's always recent enough to capture all the data you care about. But, a restore could take days, which turns two minutes of unavailable power into one day of unavailable data. If there were no reliability problem causing pool loss during power loss, two minutes unavailable power maybe means 10min of unavailable data. * there are reported problems with systems that take hours to boot up, ex. with thousands of filesystems, snapshots, or nfs exports, which isn't exactly a reliability problem, but is a problem. That open issue falls into the above outage-magnification category, too. I just don't like the idea people are building fancy space-age data centers and then thinking they can safely run crappy storage software that won't handle power outages because they're above having to worry about all that little-guy nonsense. A big selling point of the last step-forward in filesystems (metadata logging) was that they'd handle power failures with better consistency guarantees and faster reboots---at the time, did metadata logging appeal only to people with unreliable power? I hope not. never mind those of us who find these filsystem features important because we'd like cheaper or smaller systems, with cords that we sometimes trip over, that are still useful. I think having such protections in the storage software and having them actually fully working not just imaginary or fragile, is always useful, isn't something you can put yourself above by ``careful power design'' or ``paying for it'' because without them, in a disaster you've got this brittle house-of-cards system that cracks once you deviate from the specific procedures you've planned. I'm glad your disaster planning has stood the test of practice so well. But we're supposed to have an industry baseline right now that databases and MTA's and NFS servers and their underlying filesystems can lose power without losing any data, and I think we should stick to that rather than letting it slip.
pgpL40sobVdnK.pgp
Description: PGP signature
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss