Re: [zfs-discuss] What about this status report

2010-03-29 Thread Harry Putnam
Just to apologize

This not only sounds lame but IS pretty lame.

Somehow in reading the output of `zpool status POOL', I just blew right
by the URL included there:
  http://www.sun.com/msg/ZFS-8000-9P

Which has quite a decent discussion of what it means.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What about this status report

2010-03-29 Thread Tonmaus
Both are driver modules for storage adapters
Properties can be reviewed in the documentation:
ahci: http://docs.sun.com/app/docs/doc/816-5177/ahci-7d?a=view
mpt: http://docs.sun.com/app/docs/doc/816-5177/mpt-7d?a=view
ahci has a man entry on b133, as well.

cheers,

Tonmaus
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What about this status report

2010-03-29 Thread Harry Putnam
Harry Putnam  writes:

> Ethan  writes:
>
>>> Assuming your drives support SMART, I'd install smartmontools and see if
>> there are any SMART errors on the drive. While the absence of SMART errors
>
> [...]
>
>> I've had trouble getting smartmontools to work with some of my
>> controllers/drives in opensolaris, and have had better luck just booting
>> into a linux live cd, sometimes, so that may be something to keep in mind.
>
> Did you ever get it working on opensolaris?

Tonmaus  writes:

> Yes. Basically working here. All fine under ahci, some problems
> under mpt (smartctl says that WD1002fbys wouldn't allow to store
> smart events, which I think is probably nonsense.)

Thanks...   what is ahci and mpt?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What about this status report

2010-03-28 Thread Tonmaus
Yes. Basically working here. All fine under ahci, some problems under mpt 
(smartctl says that WD1002fbys wouldn't allow to store smart events, which I 
think is probably nonsense.)

Regards,

Tonmaus
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What about this status report

2010-03-28 Thread Harry Putnam
Ethan  writes:

>> Assuming your drives support SMART, I'd install smartmontools and see if
> there are any SMART errors on the drive. While the absence of SMART errors

[...]

> I've had trouble getting smartmontools to work with some of my
> controllers/drives in opensolaris, and have had better luck just booting
> into a linux live cd, sometimes, so that may be something to keep in mind.

Did you ever get it working on opensolaris?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What about this status report

2010-03-27 Thread Ethan
On Sat, Mar 27, 2010 at 18:50, Bob Friesenhahn  wrote:

> On Sat, 27 Mar 2010, Harry Putnam wrote:
>
>>
>> So its not a serious matter?  Or maybe more of a potentially serious
>> matter?
>>
>
> It is difficult to say if this is a serious matter or not.  It should not
> have happened.  The severity depends on the cause of the problem (which may
> be difficult to figure out).   Perhaps you will find out what the problem is
> some day.
>
> Bob
> --
> Bob Friesenhahn
>
> Assuming your drives support SMART, I'd install smartmontools and see if
there are any SMART errors on the drive. While the absence of SMART errors
doesn't mean the drive isn't about to fail, the presence of them can be a
good indicator that the drive is failing.
So, if there are significant SMART errors, replace the drive. If there
aren't any, then I'd keep going and see if you get more checksum errors. If
you do, replace the drive. If you don't, chalk it up to freak random
bit-flipping and forget about it.
I've had trouble getting smartmontools to work with some of my
controllers/drives in opensolaris, and have had better luck just booting
into a linux live cd, sometimes, so that may be something to keep in mind.

-Ethan
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What about this status report

2010-03-27 Thread Bob Friesenhahn

On Sat, 27 Mar 2010, Harry Putnam wrote:


So its not a serious matter?  Or maybe more of a potentially serious
matter?


It is difficult to say if this is a serious matter or not.  It should 
not have happened.  The severity depends on the cause of the problem 
(which may be difficult to figure out).   Perhaps you will find out 
what the problem is some day.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What about this status report

2010-03-27 Thread Ian Collins

On 03/28/10 10:02 AM, Harry Putnam wrote:

Bob Friesenhahn  writes:

   

On Sat, 27 Mar 2010, Harry Putnam wrote:

 

What to do with a status report like the one included below?

What does it mean to have an unrecoverable error but no data errors?
   

I think that this summary means that the zfs scrub did not encounter
any reported read/write errors from the disks, but on one of the
disks, 7 of the returned blocks had a computed checksum error.  This
could be a problem with the data that the disk previously
wrote. Perhaps there was an undetected data transfer error, the drive
firmware glitched, the drive experienced a cache memory glitch, or the
drive wrote/read data from the wrong track.

If you clear the error information, make sure you keep a record of it
in case it happens again.
 

Thanks.

So its not a serious matter?  Or maybe more of a potentially serious
matter?
   

Not really.  The error has been corrected.

Is there specific documentation somewhere that tells how to read these
status reports?

   
If you run a scrub on a pool and an error condition is fixed, the report 
wil give you a URL to check.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What about this status report

2010-03-27 Thread Giovanni Tirloni
On Sat, Mar 27, 2010 at 6:02 PM, Harry Putnam  wrote:

> Bob Friesenhahn  writes:
>
> > On Sat, 27 Mar 2010, Harry Putnam wrote:
> >
> >> What to do with a status report like the one included below?
> >>
> >> What does it mean to have an unrecoverable error but no data errors?
> >
> > I think that this summary means that the zfs scrub did not encounter
> > any reported read/write errors from the disks, but on one of the
> > disks, 7 of the returned blocks had a computed checksum error.  This
> > could be a problem with the data that the disk previously
> > wrote. Perhaps there was an undetected data transfer error, the drive
> > firmware glitched, the drive experienced a cache memory glitch, or the
> > drive wrote/read data from the wrong track.
> >
> > If you clear the error information, make sure you keep a record of it
> > in case it happens again.
>
> Thanks.
>
> So its not a serious matter?  Or maybe more of a potentially serious
> matter?
>

Not really. That exactly the kind of problem ZFS is designed to catch.


>
> Is there specific documentation somewhere that tells how to read these
> status reports?
>

Your pool is not degraded so I don't think anything will show up in fmdump.

But check 'fmdump -eV' and see the actual errors that got created. You could
find something there.

-- 
Giovanni
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What about this status report

2010-03-27 Thread Harry Putnam
Bob Friesenhahn  writes:

> On Sat, 27 Mar 2010, Harry Putnam wrote:
>
>> What to do with a status report like the one included below?
>>
>> What does it mean to have an unrecoverable error but no data errors?
>
> I think that this summary means that the zfs scrub did not encounter
> any reported read/write errors from the disks, but on one of the
> disks, 7 of the returned blocks had a computed checksum error.  This
> could be a problem with the data that the disk previously
> wrote. Perhaps there was an undetected data transfer error, the drive
> firmware glitched, the drive experienced a cache memory glitch, or the
> drive wrote/read data from the wrong track.
>
> If you clear the error information, make sure you keep a record of it
> in case it happens again.

Thanks.

So its not a serious matter?  Or maybe more of a potentially serious
matter?

Is there specific documentation somewhere that tells how to read these
status reports?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What about this status report

2010-03-27 Thread Bob Friesenhahn

On Sat, 27 Mar 2010, Harry Putnam wrote:


What to do with a status report like the one included below?

What does it mean to have an unrecoverable error but no data errors?


I think that this summary means that the zfs scrub did not encounter 
any reported read/write errors from the disks, but on one of the 
disks, 7 of the returned blocks had a computed checksum error.  This 
could be a problem with the data that the disk previously wrote. 
Perhaps there was an undetected data transfer error, the drive 
firmware glitched, the drive experienced a cache memory glitch, or the 
drive wrote/read data from the wrong track.


If you clear the error information, make sure you keep a record of it 
in case it happens again.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] What about this status report

2010-03-27 Thread Harry Putnam
What to do with a status report like the one included below?

What does it mean to have an unrecoverable error but no data errors? 

Is it just a matter of `clearing' this device?  But what would have
prompted such a report then?

Also note the numeral 7 in the CKSUM column for device c3d1s0.  What
does it mean.

----   ---=---   -   

 zpool status -vx rpool
  pool: rpool
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: scrub completed after 4h44m with 0 errors on Sat Mar 27 07:48:20 2010
config:

NAMESTATE READ WRITE CKSUM
rpool   ONLINE   0 0 0
  mirror-0  ONLINE   0 0 0
c3d0s0  ONLINE   0 0 0
c3d1s0  ONLINE   0 0 7

errors: No known data errors

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss