>>>>> "r" == Ross <[EMAIL PROTECTED]> writes:
r> This is a big step for us, we're a 100% windows company and r> I'm really going out on a limb by pushing Solaris. I'm using it in anger. I'm angry at it, and can't afford anything that's better. Whatever I replaced ZFS with, I would make sure it had: * snapshots * weekly scrubbing * dual-parity. to make the rebuild succeed after a disk fails, in case the frequent scrubbing is not adequate. and also to deal with the infant-mortality problem and the relatively high 6% annual failure rate * checksums (block- or filesystem-level, either one is fine) * fix for the RAID5 write hole (either FreeBSD-style RAID3 which is analagous to the ZFS full-stripe-write approach, or battery-backed NVRAM) * built from only drives that have been burned in for 1 month ZFS can have all those things, except the weekly scrubbing. I'm sure the scrubbing works really well for some people like Vincent, but for me it takes much longer than scrubbing took with pre-ZFS RAID, and increases filesystem latency a lot more, too. this is probably partly my broken iSCSI setup, but I'm not sure. I'm having problems where the combined load of 'zpool scrub' and some filesystem activity bogs down the Linux iSCSI targets so much that ZFS marks the whole pool faulted, so I have to use the pool ``gently'' during scrub. :( RAID-on-a-card doesn't usually have these bullet points, so I would use ZFS over RAID-on-a-card. There are too many horror stories about those damn cards, even the ``good'' ones. Even if they worked well which in my opinion they do not, they make getting access to your pool dependent on getting replacement cards of the same vintage, and get the right drivers for this proprietary, obscure card for the (possibly just re-installed different version of) the OS, possibly cards with silently-different ``steppings'' or ``firmware revisions'' or some other such garbage. Also with raid-on-a-card there is no clear way to get a support contract that stands behind the whole system, in terms of the data's availability, either. With Sun ZFS stuff there sort-of is, and definitely is with a traditional storage hardware vendor, so optimistically even if you are not covered by a contract yourself because you downloaded Solaris or bought a Filer on eBay, some other customer is, so the product (optimistically) won't make some colossally stupid mistakes that some RAID-on-a-card companies make. I would stay well away from that card crap. many ZFS problems discussed here sound like the fixes are going into s10u6, so are not available on Solaris 10 yet, and are drastic enough to introduce some regressions. I don't think ZFS in stable solaris will be up to my stability expectations until the end of the year---for now ``that's fixed in weeks-old b94'' probably doesn't fit your application. maybe for a scrappy super-competitive high-roller shared hosting shop, but not for a plodding windows shop. and having fully-working drivers for X4500 right after its replacement is announced makes me thing maybe you should buy an X4500, not the replacement. :( ZFS has been included in stable Solaris for two full years already, and you're still asking questions about it. The Solaris CIFS server I've never tried, but it is even newer, so I think you would be crazy to make yourself the black sheep pushing that within a conservative, hostile environment. If you have some experience with Samba in your environment maybe that's ok to use in place of CIFS. If you want something more out-of-the-box than Samba, you could get a NetApp StoreVault. I've never had one myself, though, so maybe I'll regret having this suggestion archived on the Interweb forever. I think unlike Samba the StoreVault can accomodate the Windows security model without kludgyness. To my view that's not necessarily a good thing, but it IS probably what a Windows shop wants. The StoreVault has all those reliability bullet points above AIUI. It's advertised as a crippled version of their real Filer's software. It may annoy you by missing certain basic, dignified features, like it is web-managed only?!, maybe you have to pay more to ``unlock'' the snapshot feature with some stupid registration code, but it should have most of the silent reliability/availability tricks that are in the higher-end Netapp's. Something cheaper than NetApp like the Adaptec SNAP filer has snapshots, scrubbing, and I assume fix for RAID5 hole, and something like the support-contract-covering-your-data though obviously not anything to set beside NetApp. Also the Windows-security-model support is kludgy. I'm not sure SNAP has dual-parity or checksums. and I've found it slightly sketchy---it was locking up every week until I forced an XFS fsck, and there is no supported way to force an XFS fsck. Their integration work does seem to hide some of the Linux crappyness but not all. LVM2 seems to be relatively high-quality on the inside compared to current ZFS. r> The problems with zpool status hanging concern me, Yes. You might distinguish bugs that affect availability from bugs that can cause data loss. The 'zpool status' not always working is half-way in between because it interferes with responding to failures. The disk-pulled problems, the slow-mirror-component-makes-whole-mirror-slow problems, and the problems of proper error handling being put off over two years with the excuse ``we're integrating FMA'' and then FMA once integrated isn't behaving reasonably problems, are all in the availability category, so maybe they aren't show-stoppers? For people using ZFS on top of an expensive storage solution, they may not care at all---if there is some weird chain of event leading to an availability problem, use the excuse ``you should have paid more and set up multipath''---the availability demands on ZFS are lower with big FC arrays. However the reports of ``my pool is corrupt, help'' / <silence> and ``the kernel {panics,runs out of memory and freezes} every time I do XXX''---these scare the shit out of me, because it means you lose your data in this frustrating way as if it were encrypted by a data-for-ransom Internet worm: some day, maybe a year from now, the bug will be fixed and maybe you can get your data back. In the mean time, you're SOL with thousands of dollars of (possibly leased) disk, while the data is just barely out of reach, perhaps sucking your time away with desperate futile maybe-this-will-work attempts. I have fairly high confidence I can recover most of the data off an abused UFS-over-SVM-mirror with dd and fsck, but I don't have that confidence at all with supposedly ``always-consistent'' ZFS. Besides several tiers of storage-layer and ZFS-layer redundancy, experience here suggests you also need rsync-level redundnacy---either to another ZFS pool, or to some other cheap backup filesystem, a backup filesystem that might be acceptable even with some of the problems in the bulleted list like not being dual-parity, not having snapshots, or having a RAID5 write hole (but it still needs to be scrubbed). If you get an integrated NAS like the StoreVault, the ZFS machine will probably be cheaper, so you could use it as the cheaper backup filesystem---rsync the storevault onto the ZFS filesystem every night. You can do this for a couple years so you will have a chance to notice if ZFS stability is improving, and maybe conduct more experiments in provoking it.
pgpVWEVQOJHwq.pgp
Description: PGP signature
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss