Re: ZFS RAIDZ2 and wd uncorrectable data error - why does ZFS not notice the hardware error?

2021-07-19 Thread Mr Roooster
On Sun, 18 Jul 2021 at 00:30, Greg Troxel wrote: > > [snip] > > Ah, interesting point. I find this confusing, because I thought an > uncorrectable read error would, for disks I've dealt with, cause the > sector to be marked as permanently failed and pending reallocation. > It depends where the

Re: ZFS RAIDZ2 and wd uncorrectable data error - why does ZFS not notice the hardware error?

2021-07-17 Thread Michael van Elst
g...@lexort.com (Greg Troxel) writes: >Ah, interesting point. I find this confusing, because I thought an >uncorrectable read error would, for disks I've dealt with, cause the >sector to be marked as permanently failed and pending reallocation. It is. Doesn't mean that further read attempts

Re: ZFS RAIDZ2 and wd uncorrectable data error - why does ZFS not notice the hardware error?

2021-07-17 Thread Greg Troxel
Mr Roooster writes: > The wd driver is retrying, (IIRC it retries 3 times) and suceeding on > the second or 3rd attempt. (See xfer 338, retry 0, followed by a 'soft > error corrected' with the same xfer number 10 seconds later. This is > the retry suceeding). Ah, interesting point. I find

Re: ZFS RAIDZ2 and wd uncorrectable data error - why does ZFS not notice the hardware error?

2021-07-17 Thread Brad Spencer
Mr Roooster writes: [snip] > The wd driver is retrying, (IIRC it retries 3 times) and suceeding on > the second or 3rd attempt. (See xfer 338, retry 0, followed by a 'soft > error corrected' with the same xfer number 10 seconds later. This is > the retry suceeding). > > This sits below ZFS and

Re: ZFS RAIDZ2 and wd uncorrectable data error - why does ZFS not notice the hardware error?

2021-07-17 Thread Mr Roooster
On Wed, 14 Jul 2021 at 12:07, Matthias Petermann wrote: > > Hello all, > > > ``` > [ 87240.313853] wd2: (uncorrectable data error) > [ 87240.313853] wd2d: error reading fsbn 5707914328 of > 5707914328-5707914455 (wd2 bn 5707914328; cn 5662613 tn 6 sn 46) > [ 87465.637977] wd2d: error reading fsbn

Re: ZFS RAIDZ2 and wd uncorrectable data error - why does ZFS not notice the hardware error?

2021-07-17 Thread Michael van Elst
m...@petermann-it.de (Matthias Petermann) writes: >wedges at boot time unnecessarily endangers the RAID in the event of a=20 >disk change. Therefore the question: is there a better possibility=20 >besides using the wedges? I remember that I had also tried the variant=20 >with the label

Re: ZFS RAIDZ2 and wd uncorrectable data error - why does ZFS not notice the hardware error?

2021-07-17 Thread Matthias Petermann
Hello together, The story is slowly coming to a conclusion and I would like to describe my observations for the sake of completeness. According to [1], SATA/ATA on NetBSD does not support hot swap. Therefore, I shut down the NAS and swapped the disk in a powerless state. I installed the

Re: ZFS RAIDZ2 and wd uncorrectable data error - why does ZFS not notice the hardware error?

2021-07-16 Thread Matthias Petermann
Hi, On 16.07.21 23:21, RVP wrote: On Fri, 16 Jul 2021, Matthias Petermann wrote: I will overwrite the disk with zeros once as a test. According to the S.M.A.R.T. values, the number of "pending" sectors has already decreased - from 18 to 15. ``` 197 200    0 no  online  positive   

Re: ZFS RAIDZ2 and wd uncorrectable data error - why does ZFS not notice the hardware error?

2021-07-16 Thread RVP
On Fri, 16 Jul 2021, Matthias Petermann wrote: I will overwrite the disk with zeros once as a test. According to the S.M.A.R.T. values, the number of "pending" sectors has already decreased - from 18 to 15. ``` 197 2000 no online positiveCurrent pending sector 15 ``` I

Re: ZFS RAIDZ2 and wd uncorrectable data error - why does ZFS not notice the hardware error?

2021-07-16 Thread Matthias Petermann
Hi Michael, On 16.07.21 16:46, Michael van Elst wrote: smartmontools has more features and also understands rare setups with e.g. RAID controllers, early USB enclosures or vendor-specific (usually undocumented) parameters. It also comes with smartd to monitor drives continously. For plain

Re: ZFS RAIDZ2 and wd uncorrectable data error - why does ZFS not notice the hardware error?

2021-07-16 Thread Michael van Elst
m...@petermann-it.de (Matthias Petermann) writes: >On 14.07.21 14:10, Greg Troxel wrote: >> I think you may have uncovered a bug in zfs statistics. >>> NAMESTATE READ WRITE CKSUM >>> tankONLINE 0 0 0 >>> raidz2-0 ONLINE 0

Re: ZFS RAIDZ2 and wd uncorrectable data error - why does ZFS not notice the hardware error?

2021-07-16 Thread Michael van Elst
m...@petermann-it.de (Matthias Petermann) writes: >Thank you very much for your valuable advice! I will add the=20 >smartmontools to my custom repository today so that I can install it on=20 >the NAS. In the meantime, I had another look at atactl - it seems to=20 >offer the possibility of reading

Re: ZFS RAIDZ2 and wd uncorrectable data error - why does ZFS not notice the hardware error?

2021-07-16 Thread Matthias Petermann
Hello Greg, On 14.07.21 14:10, Greg Troxel wrote: I think you may have uncovered a bug in zfs statistics. NAMESTATE READ WRITE CKSUM tankONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 dk0 ONLINE 0

Re: ZFS RAIDZ2 and wd uncorrectable data error - why does ZFS not notice the hardware error?

2021-07-16 Thread Matthias Petermann
Hello all, Thank you very much for your valuable advice! I will add the smartmontools to my custom repository today so that I can install it on the NAS. In the meantime, I had another look at atactl - it seems to offer the possibility of reading out the error memory or starting a self-test.

Re: ZFS RAIDZ2 and wd uncorrectable data error - why does ZFS not notice the hardware error?

2021-07-14 Thread RVP
On Wed, 14 Jul 2021, Greg Troxel wrote: Good point. I like to read all the way to the OS, and my way works on USB, Ah, yes, USB drives... for them I have to use something like: root# smartctl -d sat,12 -t offline /dev/XXX The `12' is needed for my USB 1TB Maxtor. The standard 16 byte

Re: ZFS RAIDZ2 and wd uncorrectable data error - why does ZFS not notice the hardware error?

2021-07-14 Thread RVP
On Wed, 14 Jul 2021, Greg Troxel wrote: What I do is for each of my (physical) disks, spinning and ssd, is (x86 centric; c for others), once every few months dd if=/dev/rwd0d of=/dev/null bs=1m and see if that throws any errors. If there is one, I try to read that block a few times, and

Re: ZFS RAIDZ2 and wd uncorrectable data error - why does ZFS not notice the hardware error?

2021-07-14 Thread Greg Troxel
RVP writes: > You can make the drive itself do that whole disk scan and collect > the `offline' statistics while it is doing so. This is using the > smartmontools package: > > root# smartctl -t long /dev/XXX > > The command will show how long it'll take for that test to complete > (a few hours

Re: ZFS RAIDZ2 and wd uncorrectable data error - why does ZFS not notice the hardware error?

2021-07-14 Thread Greg Troxel
Matthias Petermann writes: > I run a NetBSD-based NAS at home. It is currently running on NetBSD 9.1. = Probably you should bring it forward along netbsd-9, but that's likely unrelated. > The system is booted from a USB stick on which the root file system is > also located. The storage is on

ZFS RAIDZ2 and wd uncorrectable data error - why does ZFS not notice the hardware error?

2021-07-14 Thread Matthias Petermann
Hello all, I run a NetBSD-based NAS at home. It is currently running on NetBSD 9.1. The system is booted from a USB stick on which the root file system is also located. The storage is on 4 x 4 TB magnetic hard disks, configured as ZFS RAIDZ2. Earlier I noticed that the I/O performance of