Re: [zfs-discuss] Two disks giving errors in a raidz pool, advice needed

2012-04-25 Thread Manuel Ryan
Hey again, I'm back with some news from my situation.

I tried taking out the faulty disk 5 and replacing it with a new disk, but
the pool showed up as FAULTED. So I plugged the faulting disk back keeping
the new disk in the machine, then ran a zpool replace.

After the new disk resilvered completely (took around 9 hours), the zpool
status still shows the disk as "replacing" but is not doing anything
(iostat not showing any disk activity). If I try to remove the faulty
drive, the pool shows up a DEGRADED now and still "replacing" the old
broken disk.

The overall state of the pool seems to have been getting worse, the other
failing disk is giving the write errors again, the pool had 28k corrupted
files (60k checksum errors on the raidz1 and  28k checksum errors on the
pool itself).

After seeing that, I tried to do a zpool clear to try and help the replace
process finish. After this, disk 1 was UNAVAIL due to too many IO errors
and the pool was DEGRADED.

I rebooted the machine, the pool is not back ONLINE with the disk5 still
saying "replacing" and 0 errors except permanent ones.

I don't really know what to try next :-/ any idea ?



On Mon, Apr 23, 2012 at 7:35 AM, Daniel Carosone  wrote:

> On Mon, Apr 23, 2012 at 05:48:16AM +0200, Manuel Ryan wrote:
> > After a reboot of the machine, I have no more write errors on disk 2
> (only
> > 4 checksum, not growing), I was able to access data which I previously
> > couldn't and now only the checksum errors on disk 5 are growing.
>
> Well, that's good, but what changed?   If it was just a reboot and
> perhaps power-cycle of the disks, I don't think you've solved much in
> the long term..
>
> > Fortunately, I was able to recover all important data in those conditions
> > (yeah !),
>
> .. though that's clearly the most important thing!
>
> If you're down to just checksum errors now, then run a scrub and see
> if they can all be repaired, before replacing the disk.  If you
> haven't been able to get a scrub complete, then either:
>  * delete unimportant / rescued data, until none of the problem
>   sectors are referenced any longer, or
>  * "replace" the disk like I suggested last time, with a copy under
>   zfs' nose and switch
>
> > And since I can live with loosing the pool now, I'll gamble away and
> > replace drive 5 tomorrow and if that fails i'll just destroy the pool,
> > replace the 2 physical disks and build a new one (maybe raidz2 this time
> :))
>
> You know what?  If you're prepared to do that in the worst of
> circumstances, it would be a very good idea to do that under the best
> of circumstances.  If you can, just rebuild it raidz2 and be happier
> next time something flaky happens with this hardware.
>
> > I'll try to leave all 6 original disks in the machine while replacing,
> > maybe zfs will be smart enough to use the 6 drives to build the
> replacement
> > disk ?
>
> I don't think it will.. others who know the code, feel free to comment
> otherwise.
>
> If you've got the physical space for the extra disk, why not keep it
> there and build the pool raidz2 with the same capacity?
>
> > It's a miracle that zpool still shows disk 5 as "ONLINE", here's a SMART
> > dump of disk 5 (1265 Current_Pending_Sector, ouch)
>
> That's all indicative of read errors. Note that your reallocated
> sector count on that disk is still low, so most of those will probably
> clear when overwritten and given a chance to re-map.
>
> If these all appeared suddenly, clearly the disk has developed a
> problem. Normally, they appear gradually as head sensitivity
> diminishes.
>
> How often do you normally run a scrub, before this happened?  It's
> possible they were accumulating for a while but went undetected for
> lack of read attempts to the disk.  Scrub more often!
>
> --
> Dan.
>
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Two disks giving errors in a raidz pool, advice needed

2012-04-24 Thread Richard Elling
On Apr 24, 2012, at 8:35 AM, Jim Klimov wrote:

> On 2012-04-24 19:14, Tim Cook wrote:
>> Personally unless the dataset is huge and you're using z3, I'd be
>> scrubbing once a week.  Even if it's z3, just do a window on Sunday's or
>> something so that you at least make it through the whole dataset at
>> least once a month.

It depends. There are cascading failure modes in your system that are not
media related and cause bring your system to its knees. Scrubs and resilvers
can trigger or exacerbate these.

> +1 I guess
> Among other considerations, if the scrub does find irrepairable errors,
> you might have some recent-enough backups or other sources of the data,
> so the situation won't be as fatal as when you look for errors once a
> year ;)

There is considerable evidence that scrubs propagate errors for some systems
(no such evidence for ZFS systems). So it is not a good idea to have a blanket
scrub policy with high frequency.

> 
>> There's no reason NOT to scrub that I can think of other than the
>> overhead - which shouldn't matter if you're doing it during off hours.
> 
> "I heard a rumor" that HDDs can detect reading flaky sectors
> (i.e. detect a bit-rot error and recover thanks to ECC), and
> in this case they would automatically remap the revocered
> sector. So reading the disks in (logical) locations where
> your data is known to be may be a good thing to prolong its
> available life.

It is a SMART feature and the disks do it automatically for you.
 -- richard

--
ZFS Performance and Training
richard.ell...@richardelling.com
+1-760-896-4422







___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Two disks giving errors in a raidz pool, advice needed

2012-04-24 Thread Jim Klimov

On 2012-04-24 19:14, Tim Cook wrote:

Personally unless the dataset is huge and you're using z3, I'd be
scrubbing once a week.  Even if it's z3, just do a window on Sunday's or
something so that you at least make it through the whole dataset at
least once a month.


+1 I guess
Among other considerations, if the scrub does find irrepairable errors,
you might have some recent-enough backups or other sources of the data,
so the situation won't be as fatal as when you look for errors once a
year ;)


There's no reason NOT to scrub that I can think of other than the
overhead - which shouldn't matter if you're doing it during off hours.


"I heard a rumor" that HDDs can detect reading flaky sectors
(i.e. detect a bit-rot error and recover thanks to ECC), and
in this case they would automatically remap the revocered
sector. So reading the disks in (logical) locations where
your data is known to be may be a good thing to prolong its
available life.

This of course relies kinda on disk reliability - i.e. it
should be rated 24/7 and within warranted age (mechanics
should be within acceptable wear). No guarantees with other
drives, although I don't think weekly scrubs would be fatal.

If only ZFS could queue scrubbing reads more linearly... ;)

//Jim


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Two disks giving errors in a raidz pool, advice needed

2012-04-24 Thread Tim Cook
On Tue, Apr 24, 2012 at 12:16 AM, Matt Breitbach
wrote:

> So this is a point of debate that probably deserves being brought to the
> floor (probably for the umpteenth time, but indulge me).  I've heard from
> several people that I'd consider "experts" that once per year scrubbing is
> sufficient, once per quarter is _possibly_ excessive, and once a week is
> downright overkill.  Since scrub thrashes your disk, I'd like to avoid it
> if
> at all possible.
>
> My opinion is that it depends on the data.  If it's all data at rest, ZFS
> can't correct bit-rot if it's not read out on a regular interval.
>
> My biggest question on this?  How often does bit-rot occur on media that
> isn't read or written to excessively, but just spinning most of the day and
> only has 10-20GB physically read from the spindles daily?  We all know as
> data ages, it gets accessed less and less frequently.  At what point should
> you be scrubbing that "old" data every few weeks to make sure a bit or two
> hasn't flipped?
>
> FYI - I personally scrub once per month.  Probably overkill for my data,
> but
> I'm paranoid like that.
>
> -Matt
>
>
>
> -Original Message-
>
>
> How often do you normally run a scrub, before this happened?  It's
> possible they were accumulating for a while but went undetected for
> lack of read attempts to the disk.  Scrub more often!
>
> --
> Dan.
>
>
>
>

Personally unless the dataset is huge and you're using z3, I'd be scrubbing
once a week.  Even if it's z3, just do a window on Sunday's or something so
that you at least make it through the whole dataset at least once a month.

There's no reason NOT to scrub that I can think of other than the overhead
- which shouldn't matter if you're doing it during off hours.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Two disks giving errors in a raidz pool, advice needed

2012-04-23 Thread Matt Breitbach
So this is a point of debate that probably deserves being brought to the
floor (probably for the umpteenth time, but indulge me).  I've heard from
several people that I'd consider "experts" that once per year scrubbing is
sufficient, once per quarter is _possibly_ excessive, and once a week is
downright overkill.  Since scrub thrashes your disk, I'd like to avoid it if
at all possible.

My opinion is that it depends on the data.  If it's all data at rest, ZFS
can't correct bit-rot if it's not read out on a regular interval.  

My biggest question on this?  How often does bit-rot occur on media that
isn't read or written to excessively, but just spinning most of the day and
only has 10-20GB physically read from the spindles daily?  We all know as
data ages, it gets accessed less and less frequently.  At what point should
you be scrubbing that "old" data every few weeks to make sure a bit or two
hasn't flipped?

FYI - I personally scrub once per month.  Probably overkill for my data, but
I'm paranoid like that.  

-Matt



-Original Message-


How often do you normally run a scrub, before this happened?  It's
possible they were accumulating for a while but went undetected for
lack of read attempts to the disk.  Scrub more often!

--
Dan.



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Two disks giving errors in a raidz pool, advice needed

2012-04-23 Thread Jim Klimov

2012-04-23 9:35, Daniel Carosone wrote:

I'll try to leave all 6 original disks in the machine while replacing,
maybe zfs will be smart enough to use the 6 drives to build the replacement
disk ?


I don't think it will.. others who know the code, feel free to comment
otherwise.


Well, I've heard (and made) such assumption for a few times -
like that a resilver to a hotspare disk would try to use whatever
source sectors are available, in essense making raidzN+1 during
the process. It is quite possible that several disks of the array
have errors simultaneously, but there is also a chance that the
errors reside in sectors belonging to different zfs blocks, thus
much or all of data is recoverable - but the process needs all of
the original disks.

Thinking that the functionality is in the robust ZFS is at least
reasonable.

1) Does it exist in reality?
2) If the code is not there, is it a worthy RFE, maybe for GSoC?

//Jim

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Two disks giving errors in a raidz pool, advice needed

2012-04-22 Thread Daniel Carosone
On Mon, Apr 23, 2012 at 05:48:16AM +0200, Manuel Ryan wrote:
> After a reboot of the machine, I have no more write errors on disk 2 (only
> 4 checksum, not growing), I was able to access data which I previously
> couldn't and now only the checksum errors on disk 5 are growing.

Well, that's good, but what changed?   If it was just a reboot and
perhaps power-cycle of the disks, I don't think you've solved much in
the long term.. 

> Fortunately, I was able to recover all important data in those conditions
> (yeah !),

.. though that's clearly the most important thing!

If you're down to just checksum errors now, then run a scrub and see
if they can all be repaired, before replacing the disk.  If you
haven't been able to get a scrub complete, then either:
 * delete unimportant / rescued data, until none of the problem
   sectors are referenced any longer, or
 * "replace" the disk like I suggested last time, with a copy under
   zfs' nose and switch

> And since I can live with loosing the pool now, I'll gamble away and
> replace drive 5 tomorrow and if that fails i'll just destroy the pool,
> replace the 2 physical disks and build a new one (maybe raidz2 this time :))

You know what?  If you're prepared to do that in the worst of
circumstances, it would be a very good idea to do that under the best
of circumstances.  If you can, just rebuild it raidz2 and be happier
next time something flaky happens with this hardware.
 
> I'll try to leave all 6 original disks in the machine while replacing,
> maybe zfs will be smart enough to use the 6 drives to build the replacement
> disk ?

I don't think it will.. others who know the code, feel free to comment
otherwise.

If you've got the physical space for the extra disk, why not keep it
there and build the pool raidz2 with the same capacity? 

> It's a miracle that zpool still shows disk 5 as "ONLINE", here's a SMART
> dump of disk 5 (1265 Current_Pending_Sector, ouch) 

That's all indicative of read errors. Note that your reallocated
sector count on that disk is still low, so most of those will probably
clear when overwritten and given a chance to re-map.

If these all appeared suddenly, clearly the disk has developed a
problem. Normally, they appear gradually as head sensitivity
diminishes. 

How often do you normally run a scrub, before this happened?  It's
possible they were accumulating for a while but went undetected for
lack of read attempts to the disk.  Scrub more often!

--
Dan.



pgpFByqrFnHeY.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Two disks giving errors in a raidz pool, advice needed

2012-04-22 Thread Manuel Ryan
Thank you for this very detailed answer !

After a reboot of the machine, I have no more write errors on disk 2 (only
4 checksum, not growing), I was able to access data which I previously
couldn't and now only the checksum errors on disk 5 are growing.

Fortunately, I was able to recover all important data in those conditions
(yeah !),

Unfortunately, I don't have the spare disks to backup everything or try
your idea of copying every disk (very good strategy btw, I hadn't tought
about it !).

And since I can live with loosing the pool now, I'll gamble away and
replace drive 5 tomorrow and if that fails i'll just destroy the pool,
replace the 2 physical disks and build a new one (maybe raidz2 this time :))

I'll try to leave all 6 original disks in the machine while replacing,
maybe zfs will be smart enough to use the 6 drives to build the replacement
disk ?

It's a miracle that zpool still shows disk 5 as "ONLINE", here's a SMART
dump of disk 5 (1265 Current_Pending_Sector, ouch) :

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE  UPDATED
 WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate 0x002f   178   173   051Pre-fail  Always
-   3804
  3 Spin_Up_Time0x0027   253   253   021Pre-fail  Always
-   1050
  4 Start_Stop_Count0x0032   100   100   000Old_age   Always
-   86
  5 Reallocated_Sector_Ct   0x0033   198   198   140Pre-fail  Always
-   55
  7 Seek_Error_Rate 0x002e   200   200   000Old_age   Always
-   0
  9 Power_On_Hours  0x0032   094   094   000Old_age   Always
-   4606
 10 Spin_Retry_Count0x0032   100   253   000Old_age   Always
-   0
 11 Calibration_Retry_Count 0x0032   100   253   000Old_age   Always
-   0
 12 Power_Cycle_Count   0x0032   100   100   000Old_age   Always
-   84
192 Power-Off_Retract_Count 0x0032   200   200   000Old_age   Always
-   30
193 Load_Cycle_Count0x0032   179   179   000Old_age   Always
-   65652
194 Temperature_Celsius 0x0022   119   109   000Old_age   Always
-   31
196 Reallocated_Event_Count 0x0032   145   145   000Old_age   Always
-   55
197 Current_Pending_Sector  0x0032   195   195   000Old_age   Always
-   1265
198 Offline_Uncorrectable   0x0030   200   189   000Old_age   Offline
   -   0
199 UDMA_CRC_Error_Count0x0032   200   200   000Old_age   Always
-   0
200 Multi_Zone_Error_Rate   0x0008   200   001   000Old_age   Offline
   -   1
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Two disks giving errors in a raidz pool, advice needed

2012-04-22 Thread Daniel Carosone
On Mon, Apr 23, 2012 at 02:16:40PM +1200, Ian Collins wrote:
> If it were my data, I'd set the pool read only, backup, rebuild and  
> restore.  You do risk further data loss (maybe even pool loss) while the  
> new drive is resilvering.

You're definitely in a pickle.  The first priority is to try and
ensure that no further damage is done. Check and make sure you have
ample power supply. 

Setting the pool readonly would be a good start.  Powering down and
checking all the connectors and cables would be another. 

Write errors are an interesting result. Check the smart data on that
disk - either it is totally out of sectors to reallocate, or it has
some kind of interface problem.

If you can, image all the disks elsewhere, with something like
ddrescue.  Doing so sequentially rather than random IO through the
filesystem can sometimes have better results for marginal
disks/sectors.  That gives you scratch copies to work on or fall back
to, as you try other recovery methods. 

zfs15 is fairly old..  Consider presenting a copy of the pool to a
newer solaris that may have more robust recovery, as one experiment.

I wouldn't "zpool replace" anything at this point - the moment you do,
you throw away any of the good data on that disk, which might help you
recover sectors that are bad on other disks.  If you have to swap
disks, I would try and get as many of the readable sectors copied 
across to the new disk as possible (ddrescue again) with the pool
offline, and then just physically swap disks, so at least the good
data remains usable.

Try and get some clarity on what's happening with the hardware on a
individual disk level - what reads successfully (at least at the
physical layer, below zfs chksum).  Try and get at the root cause of
the write errors first; they're impeding zfs's recovery of what looks
like other 

> I would only use raidz for unimportant data, or for a copy of data from  
> a more robust pool.

Well, yeah, but a systemic problem (like bad ram or power or
controller) can manifest as a multi-disk failure no matter how many
redundant disks.

--
Dan.


pgpQQcIoIuOpx.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Two disks giving errors in a raidz pool, advice needed

2012-04-22 Thread Ian Collins

On 04/23/12 01:47 PM, Manuel Ryan wrote:
Hello, I have looked around this mailing list and other virtual spaces 
and I wasn't able to find a similar situation than this weird one.


I have a 6 disks raidz zfs15 pool. After a scrub, the status of the 
pool and all disks still show up as "ONLINE" but two of the disks are 
starting to give me errors and I do have fatal data corruption. The 
disks seems to be failing differently :


disk 2 has 78 (not growing) read errors, 43k (growing) write errors 
and 3 (not growing) checksum errors.


disk 5 has 0 read errors, 0 write errors but 7.4k checksum errors 
(growing).


Data corruption is around 22k files.

I plan to replace both disks. Which disk do you think should be 
replaced first to loose as few data as possible ?


I was thinking of replacing disk 5 first as it seems to have a lot of 
"silent" data corruption so maybe it's a bad idea to use it's output 
to replace disk 2. Also checksum and read errors on disk 2 do not seem 
to be growing as I used the pool to backup data (corrupted files could 
not be accessed, but a lot of files were fine) but write errors are 
growing extremely fast. So reading uncorrupted data from disk 2 seems 
to be working but writing on it seems to be problematic.


Do you guys also think I should change disk 5 first or am I missing 
something ?


If it were my data, I'd set the pool read only, backup, rebuild and 
restore.  You do risk further data loss (maybe even pool loss) while the 
new drive is resilvering.


I would only use raidz for unimportant data, or for a copy of data from 
a more robust pool.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Two disks giving errors in a raidz pool, advice needed

2012-04-22 Thread Bob Friesenhahn

On Mon, 23 Apr 2012, Manuel Ryan wrote:


Do you guys also think I should change disk 5 first or am I missing something ?


From your description, this sounds like the best course of action, but 
you should look at your system log files to see what sort of issues 
are being logged.  Also consult the output of 'iostat -xe' to see what 
low-level errors are being logged.



I'm not an expert with zfs so any insight to help me replace those disks 
without loosing too much
data would be much appreciated :)


If this is really raidz1 then more data is definitely at risk if 
several disks seem to be failing at once.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Two disks giving errors in a raidz pool, advice needed

2012-04-22 Thread Manuel Ryan
Hello, I have looked around this mailing list and other virtual spaces and
I wasn't able to find a similar situation than this weird one.

I have a 6 disks raidz zfs15 pool. After a scrub, the status of the pool
and all disks still show up as "ONLINE" but two of the disks are starting
to give me errors and I do have fatal data corruption. The disks seems to
be failing differently :

disk 2 has 78 (not growing) read errors, 43k (growing) write errors and 3
(not growing) checksum errors.

disk 5 has 0 read errors, 0 write errors but 7.4k checksum errors (growing).

Data corruption is around 22k files.

I plan to replace both disks. Which disk do you think should be replaced
first to loose as few data as possible ?

I was thinking of replacing disk 5 first as it seems to have a lot of
"silent" data corruption so maybe it's a bad idea to use it's output to
replace disk 2. Also checksum and read errors on disk 2 do not seem to be
growing as I used the pool to backup data (corrupted files could not be
accessed, but a lot of files were fine) but write errors are growing
extremely fast. So reading uncorrupted data from disk 2 seems to be working
but writing on it seems to be problematic.

Do you guys also think I should change disk 5 first or am I missing
something ?

I'm not an expert with zfs so any insight to help me replace those disks
without loosing too much data would be much appreciated :)

Regards,

Ryan
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss