Re: [zfs-discuss] FS Reliability WAS: about btrfs and zfs
Recently someone posted to this list of that _exact_ situation, they loaded an OS to a pair of drives while a pair of different drives containing an OS were still attached. The zpool on the first pair ended up not being able to be imported, and were corrupted. I can post more info when I am back in the office on Monday. On Friday, October 21, 2011, Fred Liu wrote: > >> 3. Do NOT let a system see drives with more than one OS zpool at the >> same time (I know you _can_ do this safely, but I have seen too many >> horror stories on this list that I just avoid it). >> > > Can you elaborate #3? In what situation will it happen? > > > Thanks. > > Fred > -- {1-2-3-4-5-6-7-} Paul Kraus -> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) -> Sound Coordinator, Schenectady Light Opera Company ( http://www.sloctheater.org/ ) -> Technical Advisor, RPI Players ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] FS Reliability WAS: about btrfs and zfs
On Fri, Oct 21, 2011 at 8:02 PM, Fred Liu wrote: > >> 3. Do NOT let a system see drives with more than one OS zpool at the >> same time (I know you _can_ do this safely, but I have seen too many >> horror stories on this list that I just avoid it). >> > > Can you elaborate #3? In what situation will it happen? Some people have trained their fingers to use the -f option on every command that supports it to force the operation. For instance, how often do you do rm -rf vs. rm -r and answer questions about every file? If various zpool commands (import, create, replace, etc.) are used against the wrong disk with a force option, you can clobber a zpool that is in active use by another system. In a previous job, my lab environment had a bunch of LUNs presented to multiple boxes. This was done for convenience in an environment where there would be little impact if an errant command were issued. I'd never do that in production without some form of I/O fencing in place. -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] FS Reliability WAS: about btrfs and zfs
> 3. Do NOT let a system see drives with more than one OS zpool at the > same time (I know you _can_ do this safely, but I have seen too many > horror stories on this list that I just avoid it). > Can you elaborate #3? In what situation will it happen? Thanks. Fred ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Growing CKSUM errors with no READ/WRITE errors
On Thu, Oct 20, 2011 at 7:55 AM, Edward Ned Harvey wrote: >> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- >> boun...@opensolaris.org] On Behalf Of Jim Klimov >> >> new CKSUM errors >> are being found. There are zero READ or WRITE error counts, >> though. >> >> Should we be worried about replacing the ex-hotspare drive >> ASAP as well? > > You should not be increasing CKSUM errors. There is something wrong. I > cannot say it's necessarily the fault of the drive, but probably it is. When > some threshold is reached, ZFS should mark the drive as faulted due to too > many cksum errors. I don't recommend waiting for it. It probably indicates something else faulty in the I/O path, which could include RAM, HBA or integrated controller chip, loose or defective cabling, etc. If RAM is ECC-capable, it seems unlikely to be the issue. I'd make sure all cables are fully seated and not kinked or otherwise damaged. Eric ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] commercial zfs-based storage replication software?
2011/10/20 Jim Klimov : > 2011-10-19 17:54, Fajar A. Nugraha пишет: >> >> On Wed, Oct 19, 2011 at 7:52 PM, Jim Klimov wrote: >>> >>> Well, just for the sake of completeness: most of our systems are using >>> zfs-auto-snap service, including Solaris 10 systems datiing from Sol10u6. >>> Installation of relevant packages from SXCE (ranging snv_117-snv_130) was >>> trivial, but some script-patching was in order. I think, replacement of the >>> ksh interpreter to ksh93. >> >> Yes, I remembered reading about that. >> > > Actually, I revised the systems: the scripts are kept in original form, but > those sol10 servers where ksh93 was absent, got a symlink: > > /usr/bin/ksh93 -> ../dt/bin/dtksh For the sake of completeness, I tried using zfs-auto-snapshot 0.12 (the last version before obsoleted by time-slider). I used opencsw's ksh 93 (http://www.opencsw.org/package/ksh/) and symlink /opt/csw/bin/ksh to /usr/bin/ksh93. It kinda works when only used for automated snapshot. But when using zfs/backup-save-cmd property ( to make remote system do a "zfs receive -d -F", problem started to happen. It complains the snapshot already exist. Probably the result of interaction between com.sun:auto-snapshot user property, child dataset, and "zfs send -R". If only "zfs receive" has the option to "force delete the snapshot if it already exist, rollback to the previous one, and retry the incremental receive" :) Since zfs-auto-snapshot has been obsoleted (and obviously unsupported), it wouldn't make sense for me to try to fix this. I might revisit this option again later using time-slider when solaris 11 is released though. Thanks for the info so far. -- Fajar ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss