Re: [zfs-discuss] ZFS still crashing after patch

2008-05-01 Thread Phillip Wagstrom -- Area SSE MidAmerica
Rustam wrote:
 Today my production server crashed  4 times. THIS IS NIGHTMARE! 
 Self-healing file system?! For me ZFS is SELF-KILLING filesystem.
 
 I cannot fsck it, there's no such tool. I cannot scrub it, it crashes
 30-40 minutes after scrub starts. I cannot use it, it crashes a
 number of times every day! And with every crash number of checksum
 failures is growing:
 
 NAMESTATE READ WRITE CKSUM box5ONLINE   0
 0 0 ...after a few hours... box5ONLINE   0 0
 4 ...after a few hours... box5ONLINE   0 0 62 
 ...after another few hours... box5ONLINE   0 0
 120 ...crash! and we start again... box5ONLINE   0 0
 0 ...etc...
 
 actually 120 is record, sometimes it crashed as soon as it boots.
 
 and always there's a permanent error: errors: Permanent errors have
 been detected in the following files: box5:0x0
 
 and very wise self-healing advice: http://www.sun.com/msg/ZFS-8000-8A
  Restore the file in question if possible.  Otherwise restore the
 entire pool from backup.
 
 Thanks, but if I restore it from backup it won't be ZFS anymore,
 that's for sure.

That's a bit harsh.  ZFS is telling you that you have corrupted data 
based on the checksums.  Other types of filesystems would likely simply 
pass the corrupted data on silently.

 It's not I/O problem. AFAIK, default ZFS I/O error behavior is wait
 to repair (i've 10U4, non-configurable). Then why it panics?

Do you have the panic messages?  ZFS won't cause panics based on bad 
checksums.  It will by default cause panic if it can't write data out to 
any device or if it completely loses access to non-redundant devices or 
loses both redundant devices at the same time.

 Recently there were discussions on failure of OpenSolaris community.
 Now it's been more than half a month since I reported such an error.
 Nobody even posted something like RTFM. Come on guys, I know you
 are there and busy with enterprise customers... but at least give me
 some troubleshooting ideas. i'm totally lost.
 
 just to remind, it's heavily loaded fs with 3-4 million files and
 folders.
 
 Link to original post: 
 http://www.opensolaris.org/jive/thread.jspa?threadID=57425

Since this seems to show the same number of checksum errors across 2 
different channels and 4 different drives.  Given that, I'd assume that 
this is likely a dual-channel HBA of some sort.  It would appear that 
you either have bad hardware or some sort of driver issue.

Regards,
Phil

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris 10 6/06 now available for download

2006-06-27 Thread Phillip Wagstrom -- Area SSE MidAmerica

Shannon Roddy wrote:

Solaris 10u2 was released today.  You can now download it from here:

http://www.sun.com/software/solaris/get.jsp
  


Does anyone know if ZFS is included in this release?  One of my local
Sun reps said it did not make it into the u2 release, though I have
heard for ages that 6/06 would include it.


Yes.

[EMAIL PROTECTED]:/home/pwags% more /etc/release
   Solaris 10 6/06 s10s_u2wos_09a SPARC
   Copyright 2006 Sun Microsystems, Inc.  All Rights Reserved.
Use is subject to license terms.
 Assembled 09 June 2006
[EMAIL PROTECTED]:/home/pwags% zpool list
NAMESIZEUSED   AVAILCAP  HEALTH ALTROOT
sse1.06T455G633G41%  ONLINE -


Regards,
Phil

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss