Re: [zfs-discuss] zfs data corruption

2008-04-28 Thread eric kustarz

On Apr 27, 2008, at 4:39 PM, Carson Gaspar wrote:

> Ian Collins wrote:
>> Carson Gaspar wrote:
>
>>> If this is possible, it's entirely undocumented... Actually, fmd's
>>> documentation is generally terrible. The sum total of configuration
>>> information is:
>>>
>>> FILES
>>>  /etc/fm/fmd Fault manager  configuration  direc-
>>>  tory
>>>
>>> Which is empty... It does look like I could write code to copy the
>>> output of "fmdump -f" somewhere useful if I had to.
>>>
>>>
>> Have you tried man fmadm?
>>
>> http://onesearch.sun.com/search/docs/index.jsp?col=docs_en&locale=en&qt=fmadm&cs=false&st=11
>>
>> Brings up some useful information.
>
> "man fmadm" has:
>
> - nothing to do with configuration (the topic) (OK, it "prints the
> config", whatever that means, but you can't _change_ anything)
> - no examples of usage
>
> I stand by my statement that the fault management docs need a lot of  
> help.

I found the fmadm manpage very unhelpful as well.  This CR is going to  
be fixed soon:
6679902 fmadm(1M) needs examples
http://bugs.opensolaris.org/view_bug.do?bug_id=6679902

If you have specifics, feel free to add to the CR.

eric

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs data corruption

2008-04-27 Thread Mark A. Carlson

http://www.sun.com/bigadmin/features/articles/selfheal.jsp

-- mark

Carson Gaspar wrote:

Ian Collins wrote:
  

Carson Gaspar wrote:



  
If this is possible, it's entirely undocumented... Actually, fmd's 
documentation is generally terrible. The sum total of configuration 
information is:


FILES
  /etc/fm/fmd Fault manager  configuration  direc-
  tory

Which is empty... It does look like I could write code to copy the 
output of "fmdump -f" somewhere useful if I had to.


  
  

Have you tried man fmadm?

http://onesearch.sun.com/search/docs/index.jsp?col=docs_en&locale=en&qt=fmadm&cs=false&st=11

Brings up some useful information.



"man fmadm" has:

- nothing to do with configuration (the topic) (OK, it "prints the 
config", whatever that means, but you can't _change_ anything)

- no examples of usage

I stand by my statement that the fault management docs need a lot of help.

  
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs data corruption

2008-04-27 Thread Carson Gaspar
Ian Collins wrote:
> Carson Gaspar wrote:

>> If this is possible, it's entirely undocumented... Actually, fmd's 
>> documentation is generally terrible. The sum total of configuration 
>> information is:
>>
>> FILES
>>   /etc/fm/fmd Fault manager  configuration  direc-
>>   tory
>>
>> Which is empty... It does look like I could write code to copy the 
>> output of "fmdump -f" somewhere useful if I had to.
>>
>>   
> Have you tried man fmadm?
> 
> http://onesearch.sun.com/search/docs/index.jsp?col=docs_en&locale=en&qt=fmadm&cs=false&st=11
> 
> Brings up some useful information.

"man fmadm" has:

- nothing to do with configuration (the topic) (OK, it "prints the 
config", whatever that means, but you can't _change_ anything)
- no examples of usage

I stand by my statement that the fault management docs need a lot of help.

-- 
Carson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs data corruption

2008-04-27 Thread Ian Collins
Carson Gaspar wrote:
> Nathan Kroenert - Server ESG wrote:
>
>   
>> I also *believe* (though am not certain - Perhaps someone else on the 
>> list might be?) it would be possible to have each *event* (so - the 
>> individual events that lead to a Fault Diagnosis) generate a message if 
>> it was required, though I have never taken the time to do that one...
>> 
>
> If this is possible, it's entirely undocumented... Actually, fmd's 
> documentation is generally terrible. The sum total of configuration 
> information is:
>
> FILES
>   /etc/fm/fmd Fault manager  configuration  direc-
>   tory
>
> Which is empty... It does look like I could write code to copy the 
> output of "fmdump -f" somewhere useful if I had to.
>
>   
Have you tried man fmadm?

http://onesearch.sun.com/search/docs/index.jsp?col=docs_en&locale=en&qt=fmadm&cs=false&st=11

Brings up some useful information.

Ian

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs data corruption

2008-04-27 Thread Carson Gaspar
Nathan Kroenert - Server ESG wrote:

> I also *believe* (though am not certain - Perhaps someone else on the 
> list might be?) it would be possible to have each *event* (so - the 
> individual events that lead to a Fault Diagnosis) generate a message if 
> it was required, though I have never taken the time to do that one...

If this is possible, it's entirely undocumented... Actually, fmd's 
documentation is generally terrible. The sum total of configuration 
information is:

FILES
  /etc/fm/fmd Fault manager  configuration  direc-
  tory

Which is empty... It does look like I could write code to copy the 
output of "fmdump -f" somewhere useful if I had to.

> All of this said, I understand if you feel things are being 'hidden' 
> from you until it's *actually* busted that you are having some of your 
> forward vision obscured 'in the name of a quiet logfile'. I felt much 
> the same way for a period of time. (Though, I live more in the CPU / 
> Memory camp...)
> 
> But - Once I realised what I could do with fmstat and fmdump, I was not 
> the slightest bit unhappy (Actually, that's not quite true... Even once 
> I knew what they could do, it still took me a while to work out the 
> options I cared about for fmdump / fmstat), but I now trust FMA to look 
> after my CPU / Memory issues better than I would in real life. I can 
> still get what I need when I want to, and the data is actually more 
> accessible and interesting. I just needed to know where to go looking.
> 
> All this being said, I was not actually aware that many of our disk / 
> target drivers were actually FMA'd up yet. heh - Shows what I know.
> 
> Does any of this make you feel any better (or worse)?

Hiding the raw data isn't helping. Log it at debug if you want, but log 
it off-box. The local logs won't be available when your server is dead 
and you want to figure out why.

A real world example is that sometimes the only host-side sign of FC 
storage issues is a retryable error (as everything is redundant). Now 
I'm sure the storage folks can get other errors out of their side, but 
sadly I can't. That retryable error is our canary in the coal mine 
warning us that we may have just lost redundancy. We don't want fmd to 
take any action, but we do want to know...

-- 
Carson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs data corruption

2008-04-27 Thread Nathan Kroenert - Server ESG
Note: IANATZD (I Am Not A Team-ZFS Dude)

Speaking as a Hardware Guy, knowing that something is happening, has 
happened or is indicated to happen is a Good Thing (tm).

Begin unlikely, but possible scenario:

If, for instance, I'm getting a cluster of read errors (or, perhaps bad 
blocks), I could:
  - See it as it's happening
  - See the block number for each error
  - already know the rate at which the errors are happening
  - Be able to determine that it's not good, and it's time to replace 
the disk.
  - You get the picture...

And based on this information, I could feel confident that I have the 
right information at hand to be able to determine that it is or is not 
time to replace this disk.

Of course, that assumes:
  - I know anything about disks
  - I know anything about the error messages
  - I have some sort of logging tool that recognises the errors (and 
does not just throw out the 'retryable ones', as most I have seen are 
configured to do)
  - I care
  - The folks watching the logs in the enterprise management tool care
  - My storage even bothers to report the errors

Certainly, for some organisations, all of the above are exactly how it 
works, and it works well for them.

Looking at the ZFS/FMA approach, it certainly is somewhat different.

The (very) rough concept is that FMA gets pretty much all errors 
reported to it. It logs them, in a persistent store, which is always 
available to view. It also makes diagnoses on the errors, based on the 
rules that exist for that particular style of error. Once enough (or the 
right type of) errors happen, it'll then make a Fault Diagnosis for that 
component, and log a message, loud and proud into the syslog. It may 
also take other actions, like, retire a page of memory, offline a CPU, 
panic the box, etc.

So - That's the rough overview.

It's worth noting up front that we can *observe* every event that has 
happened. Using fmdump and fmstat we can immediately see if anything 
interesting has been happening, or we can wait for a Fault Diagnosis, in 
which case, we can just watch /var/adm/messages.

I also *believe* (though am not certain - Perhaps someone else on the 
list might be?) it would be possible to have each *event* (so - the 
individual events that lead to a Fault Diagnosis) generate a message if 
it was required, though I have never taken the time to do that one...

There are many advantages to this approach - It does not rely on 
logfiles, offsets into logfiles, counters of previously processes 
messages and all of the other doom and gloom that comes with scraping 
logfiles. It's something you can simply ask: Any issues, chief? The 
answer is there in a flash.

You will also be less likely to have the messages rolled out of the logs 
before you get to them (another classic...).

And - You get some great details from fmdump showing you what's really 
going on, and it's something that's really easy to parse to look for 
patterns.

All of this said, I understand if you feel things are being 'hidden' 
from you until it's *actually* busted that you are having some of your 
forward vision obscured 'in the name of a quiet logfile'. I felt much 
the same way for a period of time. (Though, I live more in the CPU / 
Memory camp...)

But - Once I realised what I could do with fmstat and fmdump, I was not 
the slightest bit unhappy (Actually, that's not quite true... Even once 
I knew what they could do, it still took me a while to work out the 
options I cared about for fmdump / fmstat), but I now trust FMA to look 
after my CPU / Memory issues better than I would in real life. I can 
still get what I need when I want to, and the data is actually more 
accessible and interesting. I just needed to know where to go looking.

All this being said, I was not actually aware that many of our disk / 
target drivers were actually FMA'd up yet. heh - Shows what I know.

Does any of this make you feel any better (or worse)?

Nathan.

Mark A. Carlson wrote:
> fmd(1M) can log faults to syslogd that are already diagnosed. Why
> would you want the random spew as well?
> 
> -- mark
> 
> Carson Gaspar wrote:
>> [EMAIL PROTECTED] wrote:
>>
>>   
>>> It's not safe to jump to this conclusion.  Disk drivers that support FMA
>>> won't log error messages to /var/adm/messages.  As more support for I/O
>>> FMA shows up, you won't see random spew in the messages file any more.
>>> 
>>
>> 
>> That is a Very Bad Idea. Please convey this to whoever thinks that 
>> they're "helping" by not sysloging I/O errors. If this shows up in 
>> Solaris 11, we will Not Be Amused. Lack of off-box error logging will 
>> directly cause loss of revenue.
>> 
>>
>>   
> 
> 
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zf

Re: [zfs-discuss] zfs data corruption

2008-04-27 Thread Bob Friesenhahn
On Sat, 26 Apr 2008, Carson Gaspar wrote:
>> It's not safe to jump to this conclusion.  Disk drivers that support FMA
>> won't log error messages to /var/adm/messages.  As more support for I/O
>> FMA shows up, you won't see random spew in the messages file any more.
>
> 
> That is a Very Bad Idea. Please convey this to whoever thinks that
> they're "helping" by not sysloging I/O errors. If this shows up in
> Solaris 11, we will Not Be Amused. Lack of off-box error logging will
> directly cause loss of revenue.
> 

I am glad to hear that your large financial institution (Bear 
Stearns?) is contributing to the OpenSolaris project. :-)

Today's systems are very complex and may contain many tens of disks. 
Syslog is a bottleneck and often logs to local files, which grow very 
large, and hinder system performance while many log messages are being 
reported.  If syslog is to a remote host, then the network is also 
impacted.

If a device (or several inter-related devices) is/are experiencing 
problems, it seems best to isolate and diagnose it, with one 
intelligent notification rather than spewing hundreds of thousands of 
low-level error messages to a system logger.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs data corruption

2008-04-27 Thread Mark A. Carlson

fmd(1M) can log faults to syslogd that are already diagnosed. Why
would you want the random spew as well?

-- mark

Carson Gaspar wrote:

[EMAIL PROTECTED] wrote:

  

It's not safe to jump to this conclusion.  Disk drivers that support FMA
won't log error messages to /var/adm/messages.  As more support for I/O
FMA shows up, you won't see random spew in the messages file any more.




That is a Very Bad Idea. Please convey this to whoever thinks that 
they're "helping" by not sysloging I/O errors. If this shows up in 
Solaris 11, we will Not Be Amused. Lack of off-box error logging will 
directly cause loss of revenue.



  
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs data corruption

2008-04-26 Thread Carson Gaspar
[EMAIL PROTECTED] wrote:

> It's not safe to jump to this conclusion.  Disk drivers that support FMA
> won't log error messages to /var/adm/messages.  As more support for I/O
> FMA shows up, you won't see random spew in the messages file any more.


That is a Very Bad Idea. Please convey this to whoever thinks that 
they're "helping" by not sysloging I/O errors. If this shows up in 
Solaris 11, we will Not Be Amused. Lack of off-box error logging will 
directly cause loss of revenue.


-- 
Carson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs data corruption

2008-04-24 Thread johansen
> I'm just interested in understanding how zfs determined there was data
> corruption when I have checksums disabled and there were no
> non-retryable read errors reported in the messages file.

If the metadata is corrupt, how is ZFS going to find the data blocks on
disk?

> >  I don't believe it was a real disk read error because of the
> >  absence of evidence in /var/adm/messages.

It's not safe to jump to this conclusion.  Disk drivers that support FMA
won't log error messages to /var/adm/messages.  As more support for I/O
FMA shows up, you won't see random spew in the messages file any more.

-j
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs data corruption

2008-04-24 Thread Victor Engle
Just to clarify this post. This isn't data I care about recovering.
I'm just interested in understanding how zfs determined there was data
corruption when I have checksums disabled and there were no
non-retryable read errors reported in the messages file.

On Wed, Apr 23, 2008 at 9:52 PM, Victor Engle <[EMAIL PROTECTED]> wrote:
> Thanks! That would explain things. I don't believe it was a real disk
>  read error because of the absence of evidence in /var/adm/messages.
>
>  I'll review the man page and documentation to confirm that metadata is
>  checksummed.
>
>  Regards,
>  Vic
>
>
>
>
>  On Wed, Apr 23, 2008 at 6:30 PM, Nathan Kroenert
>  <[EMAIL PROTECTED]> wrote:
>  > I'm just taking a stab here, so could be completely wrong, but IIRC, even 
> if
>  > you disable checksum, it still checksums the metadata...
>  >
>  >  So, it could be metadata checksum errors.
>  >
>  >  Others on the list might have some funky zdb thingies you could to see 
> what
>  > it actually is...
>  >
>  >  Note: typed pre caffeine... :)
>  >
>  >  Nathan
>  >
>  >
>  >
>  >  Vic Engle wrote:
>  >
>  > > I'm hoping someone can help me understand a zfs data corruption symptom.
>  > We have a zpool with checksum turned off. Zpool status shows that data
>  > corruption occured. The application using the pool at the time reported a
>  > "read" error and zoppl status (see below) shows 2 read errors on a device.
>  > The thing that is confusing to me is how ZFS determines that data 
> corruption
>  > exists when reading data from a pool with checkdum turned off.
>  > >
>  > > Also, I'm wondering about the persistent errors in the output below. 
> Since
>  > no specific file or directory is mentioned does this indicate pool metadata
>  > is corrupt?
>  > >
>  > > Thanks for any help interpreting the output...
>  > >
>  > >
>  > > # zpool status -xv
>  > >  pool: zpool1
>  > >  state: ONLINE
>  > > status: One or more devices has experienced an error resulting in data
>  > >corruption.  Applications may be affected.
>  > > action: Restore the file in question if possible.  Otherwise restore the
>  > >entire pool from backup.
>  > >   see: http://www.sun.com/msg/ZFS-8000-8A
>  > >  scrub: none requested
>  > > config:
>  > >
>  > >NAME STATE READ WRITE 
> CKSUM
>  > >zpool1   ONLINE   2 0 > 0
>  > >  c4t60A9800043346859444A476B2D48446Fd0  ONLINE   0 0 > 0
>  > >  c4t60A9800043346859444A476B2D484352d0  ONLINE   0 0 > 0
>  > >  c4t60A9800043346859444A476B2D484236d0  ONLINE   0 0 > 0
>  > >  c4t60A9800043346859444A476B2D482D6Cd0  ONLINE   0 0 > 0
>  > >  c4t60A9800043346859444A476B2D483951d0  ONLINE   0 0 > 0
>  > >  c4t60A9800043346859444A476B2D483836d0  ONLINE   0 0 > 0
>  > >  c4t60A9800043346859444A476B2D48366Bd0  ONLINE   0 0 > 0
>  > >  c4t60A9800043346859444A476B2D483551d0  ONLINE   0 0 > 0
>  > >  c4t60A9800043346859444A476B2D483435d0  ONLINE   0 0 > 0
>  > >  c4t60A9800043346859444A476B2D48326Bd0  ONLINE   0 0 > 0
>  > >  c4t60A9800043346859444A476B2D483150d0  ONLINE   0 0 > 0
>  > >  c4t60A9800043346859444A476B2D483035d0  ONLINE   0 0 > 0
>  > >  c4t60A9800043346859444A476B2D47796Ad0  ONLINE   0 0 > 0
>  > >  c4t60A9800043346859444A476B2D477850d0  ONLINE   0 0 > 0
>  > >  c4t60A9800043346859444A476B2D477734d0  ONLINE   0 0 > 0
>  > >  c4t60A9800043346859444A476B2D47756Ad0  ONLINE   0 0 > 0
>  > >  c4t60A9800043346859444A476B2D47744Fd0  ONLINE   0 0 > 0
>  > >  c4t60A9800043346859444A476B2D477333d0  ONLINE   0 0 > 0
>  > >  c4t60A9800043346859444A476B2D477169d0  ONLINE   0 0 > 0
>  > >  c4t60A9800043346859444A476B2D47704Ed0  ONLINE   0 0 > 0
>  > >  c4t60A9800043346859444A476B2D476F33d0  ONLINE   0 0 > 0
>  > >  c4t60A9800043346859444A476B2D476D68d0  ONLINE   0 0 > 0
>  > >  c4t60A9800043346859444A476B2D476C4Ed0  ONLINE   0 0 > 0
>  > >  c4t60A9800043346859444A476B2D476B32d0  ONLINE   0 0 > 0
>  > >  c4t60A9800043346859444A476B2D476968d0  ONLINE   0 0 > 0
>  > >  c4t60A98000433468656834476B2D453974d0  ONLINE   0 0 > 0
>  > >  c4t60A98000433468656834476B2D454142d0  ONLINE   0 0 > 0
>  > >  c4t60A98000433468656834476B2D454255d0  ONLINE   0 0 > 0
>  > >  c4t60A98000433468656834476B2D45436Dd0  ONLINE   0 0 > 0
>  > >  c4t60A9800043346859444A476B2D487346d0  ONLINE   2 0 > 0
>  > >  c4t60A9800043346859444A476B2D487175d0  ONLINE

Re: [zfs-discuss] zfs data corruption

2008-04-23 Thread Victor Engle
Thanks! That would explain things. I don't believe it was a real disk
read error because of the absence of evidence in /var/adm/messages.

I'll review the man page and documentation to confirm that metadata is
checksummed.

Regards,
Vic


On Wed, Apr 23, 2008 at 6:30 PM, Nathan Kroenert
<[EMAIL PROTECTED]> wrote:
> I'm just taking a stab here, so could be completely wrong, but IIRC, even if
> you disable checksum, it still checksums the metadata...
>
>  So, it could be metadata checksum errors.
>
>  Others on the list might have some funky zdb thingies you could to see what
> it actually is...
>
>  Note: typed pre caffeine... :)
>
>  Nathan
>
>
>
>  Vic Engle wrote:
>
> > I'm hoping someone can help me understand a zfs data corruption symptom.
> We have a zpool with checksum turned off. Zpool status shows that data
> corruption occured. The application using the pool at the time reported a
> "read" error and zoppl status (see below) shows 2 read errors on a device.
> The thing that is confusing to me is how ZFS determines that data corruption
> exists when reading data from a pool with checkdum turned off.
> >
> > Also, I'm wondering about the persistent errors in the output below. Since
> no specific file or directory is mentioned does this indicate pool metadata
> is corrupt?
> >
> > Thanks for any help interpreting the output...
> >
> >
> > # zpool status -xv
> >  pool: zpool1
> >  state: ONLINE
> > status: One or more devices has experienced an error resulting in data
> >corruption.  Applications may be affected.
> > action: Restore the file in question if possible.  Otherwise restore the
> >entire pool from backup.
> >   see: http://www.sun.com/msg/ZFS-8000-8A
> >  scrub: none requested
> > config:
> >
> >NAME STATE READ WRITE CKSUM
> >zpool1   ONLINE   2 0 0
> >  c4t60A9800043346859444A476B2D48446Fd0  ONLINE   0 0 0
> >  c4t60A9800043346859444A476B2D484352d0  ONLINE   0 0 0
> >  c4t60A9800043346859444A476B2D484236d0  ONLINE   0 0 0
> >  c4t60A9800043346859444A476B2D482D6Cd0  ONLINE   0 0 0
> >  c4t60A9800043346859444A476B2D483951d0  ONLINE   0 0 0
> >  c4t60A9800043346859444A476B2D483836d0  ONLINE   0 0 0
> >  c4t60A9800043346859444A476B2D48366Bd0  ONLINE   0 0 0
> >  c4t60A9800043346859444A476B2D483551d0  ONLINE   0 0 0
> >  c4t60A9800043346859444A476B2D483435d0  ONLINE   0 0 0
> >  c4t60A9800043346859444A476B2D48326Bd0  ONLINE   0 0 0
> >  c4t60A9800043346859444A476B2D483150d0  ONLINE   0 0 0
> >  c4t60A9800043346859444A476B2D483035d0  ONLINE   0 0 0
> >  c4t60A9800043346859444A476B2D47796Ad0  ONLINE   0 0 0
> >  c4t60A9800043346859444A476B2D477850d0  ONLINE   0 0 0
> >  c4t60A9800043346859444A476B2D477734d0  ONLINE   0 0 0
> >  c4t60A9800043346859444A476B2D47756Ad0  ONLINE   0 0 0
> >  c4t60A9800043346859444A476B2D47744Fd0  ONLINE   0 0 0
> >  c4t60A9800043346859444A476B2D477333d0  ONLINE   0 0 0
> >  c4t60A9800043346859444A476B2D477169d0  ONLINE   0 0 0
> >  c4t60A9800043346859444A476B2D47704Ed0  ONLINE   0 0 0
> >  c4t60A9800043346859444A476B2D476F33d0  ONLINE   0 0 0
> >  c4t60A9800043346859444A476B2D476D68d0  ONLINE   0 0 0
> >  c4t60A9800043346859444A476B2D476C4Ed0  ONLINE   0 0 0
> >  c4t60A9800043346859444A476B2D476B32d0  ONLINE   0 0 0
> >  c4t60A9800043346859444A476B2D476968d0  ONLINE   0 0 0
> >  c4t60A98000433468656834476B2D453974d0  ONLINE   0 0 0
> >  c4t60A98000433468656834476B2D454142d0  ONLINE   0 0 0
> >  c4t60A98000433468656834476B2D454255d0  ONLINE   0 0 0
> >  c4t60A98000433468656834476B2D45436Dd0  ONLINE   0 0 0
> >  c4t60A9800043346859444A476B2D487346d0  ONLINE   2 0 0
> >  c4t60A9800043346859444A476B2D487175d0  ONLINE   0 0 0
> >  c4t60A9800043346859444A476B2D48705Ad0  ONLINE   0 0 0
> >  c4t60A9800043346859444A476B2D486F45d0  ONLINE   0 0 0
> >  c4t60A9800043346859444A476B2D486D74d0  ONLINE   0 0 0
> >  c4t60A9800043346859444A476B2D486C5Ad0  ONLINE   0 0 0
> >  c4t60A9800043346859444A476B2D486B44d0  ONLINE   0 0 0
> >  c4t60A9800043346859444A476B2D486974d0  ONLINE   0 0 0
> >  c4t60A9800043346859444A476B2D486859d0  ONLINE   0 0 0
> >  c4t60A9800043346859444A476B2D486744d0  ONLINE   0 0 0
> >

Re: [zfs-discuss] zfs data corruption

2008-04-23 Thread Rob
 > Since no specific file or directory is mentioned
install newer bits and get better info automatically
but for now type:

zdb -vvv zpool1 17
zdb -vvv zpool1 18
zdb -vvv zpool1 19
echo remove those objects
zpool clear zpool1
zpool scrub zpool1

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs data corruption

2008-04-23 Thread Nathan Kroenert
I'm just taking a stab here, so could be completely wrong, but IIRC, 
even if you disable checksum, it still checksums the metadata...

So, it could be metadata checksum errors.

Others on the list might have some funky zdb thingies you could to see 
what it actually is...

Note: typed pre caffeine... :)

Nathan

Vic Engle wrote:
> I'm hoping someone can help me understand a zfs data corruption symptom. We 
> have a zpool with checksum turned off. Zpool status shows that data 
> corruption occured. The application using the pool at the time reported a 
> "read" error and zoppl status (see below) shows 2 read errors on a device. 
> The thing that is confusing to me is how ZFS determines that data corruption 
> exists when reading data from a pool with checkdum turned off.
> 
> Also, I'm wondering about the persistent errors in the output below. Since no 
> specific file or directory is mentioned does this indicate pool metadata is 
> corrupt?
> 
> Thanks for any help interpreting the output...
> 
> 
> # zpool status -xv
>   pool: zpool1
>  state: ONLINE
> status: One or more devices has experienced an error resulting in data
> corruption.  Applications may be affected.
> action: Restore the file in question if possible.  Otherwise restore the
> entire pool from backup.
>see: http://www.sun.com/msg/ZFS-8000-8A
>  scrub: none requested
> config:
> 
> NAME STATE READ WRITE CKSUM
> zpool1   ONLINE   2 0 0
>   c4t60A9800043346859444A476B2D48446Fd0  ONLINE   0 0 0
>   c4t60A9800043346859444A476B2D484352d0  ONLINE   0 0 0
>   c4t60A9800043346859444A476B2D484236d0  ONLINE   0 0 0
>   c4t60A9800043346859444A476B2D482D6Cd0  ONLINE   0 0 0
>   c4t60A9800043346859444A476B2D483951d0  ONLINE   0 0 0
>   c4t60A9800043346859444A476B2D483836d0  ONLINE   0 0 0
>   c4t60A9800043346859444A476B2D48366Bd0  ONLINE   0 0 0
>   c4t60A9800043346859444A476B2D483551d0  ONLINE   0 0 0
>   c4t60A9800043346859444A476B2D483435d0  ONLINE   0 0 0
>   c4t60A9800043346859444A476B2D48326Bd0  ONLINE   0 0 0
>   c4t60A9800043346859444A476B2D483150d0  ONLINE   0 0 0
>   c4t60A9800043346859444A476B2D483035d0  ONLINE   0 0 0
>   c4t60A9800043346859444A476B2D47796Ad0  ONLINE   0 0 0
>   c4t60A9800043346859444A476B2D477850d0  ONLINE   0 0 0
>   c4t60A9800043346859444A476B2D477734d0  ONLINE   0 0 0
>   c4t60A9800043346859444A476B2D47756Ad0  ONLINE   0 0 0
>   c4t60A9800043346859444A476B2D47744Fd0  ONLINE   0 0 0
>   c4t60A9800043346859444A476B2D477333d0  ONLINE   0 0 0
>   c4t60A9800043346859444A476B2D477169d0  ONLINE   0 0 0
>   c4t60A9800043346859444A476B2D47704Ed0  ONLINE   0 0 0
>   c4t60A9800043346859444A476B2D476F33d0  ONLINE   0 0 0
>   c4t60A9800043346859444A476B2D476D68d0  ONLINE   0 0 0
>   c4t60A9800043346859444A476B2D476C4Ed0  ONLINE   0 0 0
>   c4t60A9800043346859444A476B2D476B32d0  ONLINE   0 0 0
>   c4t60A9800043346859444A476B2D476968d0  ONLINE   0 0 0
>   c4t60A98000433468656834476B2D453974d0  ONLINE   0 0 0
>   c4t60A98000433468656834476B2D454142d0  ONLINE   0 0 0
>   c4t60A98000433468656834476B2D454255d0  ONLINE   0 0 0
>   c4t60A98000433468656834476B2D45436Dd0  ONLINE   0 0 0
>   c4t60A9800043346859444A476B2D487346d0  ONLINE   2 0 0
>   c4t60A9800043346859444A476B2D487175d0  ONLINE   0 0 0
>   c4t60A9800043346859444A476B2D48705Ad0  ONLINE   0 0 0
>   c4t60A9800043346859444A476B2D486F45d0  ONLINE   0 0 0
>   c4t60A9800043346859444A476B2D486D74d0  ONLINE   0 0 0
>   c4t60A9800043346859444A476B2D486C5Ad0  ONLINE   0 0 0
>   c4t60A9800043346859444A476B2D486B44d0  ONLINE   0 0 0
>   c4t60A9800043346859444A476B2D486974d0  ONLINE   0 0 0
>   c4t60A9800043346859444A476B2D486859d0  ONLINE   0 0 0
>   c4t60A9800043346859444A476B2D486744d0  ONLINE   0 0 0
>   c4t60A9800043346859444A476B2D486573d0  ONLINE   0 0 0
>   c4t60A9800043346859444A476B2D486459d0  ONLINE   0 0 0
>   c4t60A9800043346859444A476B2D486343d0  ONLINE   0 0 0
>   c4t60A9800043346859444A476B2D486173d0  ONLINE   0 0 0
>   c4t60A9800043346859444A476B2D482F58d0  ONLINE   0 0 0
>   c4t60A98000

[zfs-discuss] zfs data corruption

2008-04-23 Thread Vic Engle
I'm hoping someone can help me understand a zfs data corruption symptom. We 
have a zpool with checksum turned off. Zpool status shows that data corruption 
occured. The application using the pool at the time reported a "read" error and 
zoppl status (see below) shows 2 read errors on a device. The thing that is 
confusing to me is how ZFS determines that data corruption exists when reading 
data from a pool with checkdum turned off.

Also, I'm wondering about the persistent errors in the output below. Since no 
specific file or directory is mentioned does this indicate pool metadata is 
corrupt?

Thanks for any help interpreting the output...


# zpool status -xv
  pool: zpool1
 state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: none requested
config:

NAME STATE READ WRITE CKSUM
zpool1   ONLINE   2 0 0
  c4t60A9800043346859444A476B2D48446Fd0  ONLINE   0 0 0
  c4t60A9800043346859444A476B2D484352d0  ONLINE   0 0 0
  c4t60A9800043346859444A476B2D484236d0  ONLINE   0 0 0
  c4t60A9800043346859444A476B2D482D6Cd0  ONLINE   0 0 0
  c4t60A9800043346859444A476B2D483951d0  ONLINE   0 0 0
  c4t60A9800043346859444A476B2D483836d0  ONLINE   0 0 0
  c4t60A9800043346859444A476B2D48366Bd0  ONLINE   0 0 0
  c4t60A9800043346859444A476B2D483551d0  ONLINE   0 0 0
  c4t60A9800043346859444A476B2D483435d0  ONLINE   0 0 0
  c4t60A9800043346859444A476B2D48326Bd0  ONLINE   0 0 0
  c4t60A9800043346859444A476B2D483150d0  ONLINE   0 0 0
  c4t60A9800043346859444A476B2D483035d0  ONLINE   0 0 0
  c4t60A9800043346859444A476B2D47796Ad0  ONLINE   0 0 0
  c4t60A9800043346859444A476B2D477850d0  ONLINE   0 0 0
  c4t60A9800043346859444A476B2D477734d0  ONLINE   0 0 0
  c4t60A9800043346859444A476B2D47756Ad0  ONLINE   0 0 0
  c4t60A9800043346859444A476B2D47744Fd0  ONLINE   0 0 0
  c4t60A9800043346859444A476B2D477333d0  ONLINE   0 0 0
  c4t60A9800043346859444A476B2D477169d0  ONLINE   0 0 0
  c4t60A9800043346859444A476B2D47704Ed0  ONLINE   0 0 0
  c4t60A9800043346859444A476B2D476F33d0  ONLINE   0 0 0
  c4t60A9800043346859444A476B2D476D68d0  ONLINE   0 0 0
  c4t60A9800043346859444A476B2D476C4Ed0  ONLINE   0 0 0
  c4t60A9800043346859444A476B2D476B32d0  ONLINE   0 0 0
  c4t60A9800043346859444A476B2D476968d0  ONLINE   0 0 0
  c4t60A98000433468656834476B2D453974d0  ONLINE   0 0 0
  c4t60A98000433468656834476B2D454142d0  ONLINE   0 0 0
  c4t60A98000433468656834476B2D454255d0  ONLINE   0 0 0
  c4t60A98000433468656834476B2D45436Dd0  ONLINE   0 0 0
  c4t60A9800043346859444A476B2D487346d0  ONLINE   2 0 0
  c4t60A9800043346859444A476B2D487175d0  ONLINE   0 0 0
  c4t60A9800043346859444A476B2D48705Ad0  ONLINE   0 0 0
  c4t60A9800043346859444A476B2D486F45d0  ONLINE   0 0 0
  c4t60A9800043346859444A476B2D486D74d0  ONLINE   0 0 0
  c4t60A9800043346859444A476B2D486C5Ad0  ONLINE   0 0 0
  c4t60A9800043346859444A476B2D486B44d0  ONLINE   0 0 0
  c4t60A9800043346859444A476B2D486974d0  ONLINE   0 0 0
  c4t60A9800043346859444A476B2D486859d0  ONLINE   0 0 0
  c4t60A9800043346859444A476B2D486744d0  ONLINE   0 0 0
  c4t60A9800043346859444A476B2D486573d0  ONLINE   0 0 0
  c4t60A9800043346859444A476B2D486459d0  ONLINE   0 0 0
  c4t60A9800043346859444A476B2D486343d0  ONLINE   0 0 0
  c4t60A9800043346859444A476B2D486173d0  ONLINE   0 0 0
  c4t60A9800043346859444A476B2D482F58d0  ONLINE   0 0 0
  c4t60A9800043346859444A476B2D485A43d0  ONLINE   0 0 0
  c4t60A9800043346859444A476B2D485872d0  ONLINE   0 0 0
  c4t60A9800043346859444A476B2D485758d0  ONLINE   0 0 0
  c4t60A9800043346859444A476B2D485642d0  ONLINE   0 0 0
  c4t60A9800043346859444A476B2D485471d0  ONLINE   0 0 0
  c4t60A9800043346859444A476B2D485357d0  ONLINE   0 0 0
  c4t60A9800043346859444A476B2D48