[zfs-discuss] scsi messages and mpt warning in log - harmless, or indicating a problem?

2010-05-18 Thread Willard Korfhage
This afternoon, messages like the following started appearing in 
/var/adm/messages:

May 18 13:46:37 fs8 scsi: [ID 365881 kern.info] 
/p...@0,0/pci8086,2...@1/pci15d9,a...@0 (mpt0):
May 18 13:46:37 fs8 Log info 0x3108 received for target 5.
May 18 13:46:37 fs8 scsi_status=0x0, ioc_status=0x804b, scsi_state=0x1
May 18 13:46:38 fs8 scsi: [ID 365881 kern.info] 
/p...@0,0/pci8086,2...@1/pci15d9,a...@0 (mpt0):
May 18 13:46:38 fs8 Log info 0x3108 received for target 5.
May 18 13:46:38 fs8 scsi_status=0x0, ioc_status=0x804b, scsi_state=0x0
May 18 13:46:40 fs8 scsi: [ID 365881 kern.info] 
/p...@0,0/pci8086,2...@1/pci15d9,a...@0 (mpt0):
May 18 13:46:40 fs8 Log info 0x3108 received for target 5.
May 18 13:46:40 fs8 scsi_status=0x0, ioc_status=0x804b, scsi_state=0x0

The pool has no errors, so I don't know if these represent a potential problem 
or not.

During this time I was copying files from one fileset to another in the same 
pool, so it was fairly I/O intensive.  Typically you get one every 1-5 seconds 
for 10 to 20 seconds, sometimes longer, and then it is quiet for many minutes 
before they occur again. Is this indicating a problem, or just a harmless 
message?

I just kicked off a scrub on the pool as I was writing this, and I am seeing a 
lot of these messages. I see that zpool status shows c4t5d0 has 12.5K repaired 
already. The scrub has been in progress for just 6 minutes, and it says I have 
170629h54m to go, and it gets longer every time I check the status. I ran a 
scrub on this a few weeks ago, and had no such problem.

I also see two warnings earlier today:

May 18 19:14:09 fs8 scsi: [ID 243001 kern.warning] WARNING: 
/p...@0,0/pci8086,2...@1/pci15d9,a...@0 (mpt0):
May 18 19:14:09 fs8 mpt_handle_event_sync: IOCStatus=0x8000, 
IOCLogInfo=0x31110900
May 18 19:14:09 fs8 scsi: [ID 243001 kern.warning] WARNING: 
/p...@0,0/pci8086,2...@1/pci15d9,a...@0 (mpt0):
May 18 19:14:09 fs8 mpt_handle_event: IOCStatus=0x8000, 
IOCLogInfo=0x31110900

and two more of these 1 minute and 10 seconds later. 

So, is my system in trouble or not?

Particulars of my system:

% uname -a
SunOS fs8 5.11 snv_134 i86pc i386 i86pc

The hardware is an Asus server motherboard carrying 4GB of ECC memory and a 
current Xeon CPU, and a SuperMicro AOC-USASLP-L8I card (it uses the 1068E) with 
8 Samsung Spinpoint F3EG HD203WI 2TB disks attached.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Setting up ZFS on AHCI disks

2010-04-17 Thread Willard Korfhage
I solved the mystery - an astounding 7 out of the 10 brand new disks I was 
using were bad. I was using 4 at a time, and it wasn't until a good one got in 
the mix that I realized what was wrong. FYI, these were Western Digital 
WD15EADS and Samsung HD154UI. Each brand was mostly bad, with one or two good 
disks. The bad ones are functional enough that the BIOS can tell what type they 
are, but I got a lot of errors when I plugged them into a Linux box to check 
them.

The whole thing is bizarre enough that I wonder if they got damaged in shipping 
or if my machine somehow damaged them.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Setting up ZFS on AHCI disks

2010-04-16 Thread Willard Korfhage
isainfo -k returns amd64, so I don't think that is the answer.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Setting up ZFS on AHCI disks

2010-04-16 Thread Willard Korfhage
> There should be no need to create partitions.
>  Something simple like this
> hould work:
> zpool create junkfooblah c13t0d0
> 
> And if it doesn't work, try "zpool status" just to
> verify for certain, that
> device is not already part of any pool.

It is not part of any pool. I get the same "cannot label" message, and dmsg 
still shows the task file error messages that I mentioned before.

The drives are new, and I don't think they are bad. Likewise, the motherboard 
is new, although I see the last BIOS release was September, 2008, so the design 
has been out for a while.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Setting up ZFS on AHCI disks

2010-04-16 Thread Willard Korfhage
No Areca controller on this machine. It is a different box, and the drives are 
just plugged into the SATA ports on the motherboard.

I'm running build svn_133, too.

The drives are recent - 1.5TB drives, 3 Western Digital and 1 Seagate, if I 
recall correctly. They ought to support SATA-2. They are brand new, and haven't 
been used before.

I have the feeling I'm missing some simple, obvious step because I'm still 
pretty new to OpenSolaris.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Setting up ZFS on AHCI disks

2010-04-16 Thread Willard Korfhage
devfsadm -Cv gave a lot of "removing file" messages, apparently for items that 
were not relevant.

cfgadm -al says, about the disks,

sata0/0::dsk/c13t0d0   disk connectedconfigured   ok
sata0/1::dsk/c13t1d0   disk connectedconfigured   ok
sata0/2::dsk/c13t2d0   disk connectedconfigured   ok
sata0/3::dsk/c13t3d0   disk connectedconfigured   ok

I still get the same error message, but I'm guessing now that means I have to 
create a partition on the device. However, I am still stymied for the time 
being. fdisk can't open any of the /dev/rdsk/c13t*d0p0 devices. I tried running 
format, and get this


AVAILABLE DISK SELECTIONS:
   0. c12d1 
  /p...@0,0/pci-...@1f,1/i...@0/c...@1,0
   1. c13t0d0 
  /p...@0,0/pci1043,8...@1f,2/d...@0,0
   2. c13t1d0 
  /p...@0,0/pci1043,8...@1f,2/d...@1,0
   3. c13t2d0 
  /p...@0,0/pci1043,8...@1f,2/d...@2,0
   4. c13t3d0 
  /p...@0,0/pci1043,8...@1f,2/d...@3,0
Specify disk (enter its number): 1

Error: can't open disk '/dev/rdsk/c13t0d0p0'.


AVAILABLE DRIVE TYPES:
0. Auto configure
1. other
Specify disk type (enter its number): 0
Auto configure failed
No Solaris fdisk partition found.

At this point, I not sure whether to run fdisk, format or something else. I 
tried fdisk, partition and label, but gut the message "Current Disk Type is not 
set." I expect this is a problem because of the "drive type unknown" appearing 
on the drives. I gather from another thread that I need to run fdisk, but I 
haven't been able to do it.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Setting up ZFS on AHCI disks

2010-04-15 Thread Willard Korfhage
I'm trying to set up a raidz pool on 4 disks attached to an Asus P5BV-M 
motherboard with an Intel ICH7R.  The bios lets me pick IDE, RAID, or AHCI for 
the disks. I'm not interested in the motherboard's raid, and reading previous 
posts, it sounded like there were performance advantages to picking AHCI. 
However, I am getting errors and I am unable to create the pool. Running format 
tells me

AVAILABLE DISK SELECTIONS:
   0. c12d1 
  /p...@0,0/pci-...@1f,1/i...@0/c...@1,0
   1. c13t0d0 
  /p...@0,0/pci1043,8...@1f,2/d...@0,0
   2. c13t1d0 
  /p...@0,0/pci1043,8...@1f,2/d...@1,0
   3. c13t2d0 
  /p...@0,0/pci1043,8...@1f,2/d...@2,0
   4. c13t3d0 
  /p...@0,0/pci1043,8...@1f,2/d...@3,0

The first disk is an IDE disk containing the OS, and the 2nd four are for the 
pool. Then 

# zpool create mypool raidz c13t0d0 c13t1d0 c13t2d0 c13t3d0
cannot label 'c13t0d0': try using fdisk(1M) and then provide a specific slice

When doing this, dmsg says:

Apr 15 17:14:15 fs8 ahci: [ID 296163 kern.warning] WARNING: ahci0: ahci port 0 
has task file error
Apr 15 17:14:15 fs8 ahci: [ID 687168 kern.warning] WARNING: ahci0: ahci port 0 
is trying to do error recovery
Apr 15 17:14:15 fs8 ahci: [ID 551337 kern.warning] WARNING: ahci0:
Apr 15 17:14:15 fs8 ahci: [ID 693748 kern.warning] WARNING: ahci0: ahci port 0 
task_file_status = 0x451
Apr 15 17:14:15 fs8 genunix: [ID 353554 kern.warning] WARNING: Device 
/p...@0,0/pci1043,8...@1f,2/d...@0,0 failed to power up.

I find reports from 2006 that the ICH7R is well supported, so I'm not sure what 
the problem is. Any suggestions?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?

2010-04-15 Thread Willard Korfhage
I've got a Supermicro AOC-USAS-L8I on the way because I gather from these 
forums that it works well. I'll just wait for that, then try 8 disks on that an 
4 on the motherboard SATA ports.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?

2010-04-14 Thread Willard Korfhage
As I mentioned earlier, I removed the hardware-based Raid6 array, changed all 
the disks to passthrough disks, made a raidz2 pool using all the disk. I used 
my backup program to copy 55GB of data to the disk, and now I have errors all 
over the place.

# zpool status -v
  pool: bigraid
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: scrub completed after 0h4m with 0 errors on Wed Apr 14 22:56:36 2010
config:

NAMESTATE READ WRITE CKSUM
bigraid DEGRADED 0 0 0
  raidz2-0  DEGRADED 0 024
c4t0d0  ONLINE   0 0 3
c4t0d1  ONLINE   0 0 2
c4t0d2  ONLINE   0 0 2
c4t0d3  DEGRADED 0 0 2  too many errors
c4t0d4  ONLINE   0 0 2
c4t0d5  ONLINE   0 0 2
c4t0d6  ONLINE   0 0 1
c4t0d7  ONLINE   0 0 0
c4t1d0  ONLINE   0 0 0
c4t1d1  ONLINE   0 0 2
c4t1d2  ONLINE   0 0 2
c4t1d3  ONLINE   0 0 4

errors: No known data errors


So, zfs on hardware-supported raid was fine, but zfs on passthrough disks is 
not. I'm at a loss to explain it. Any ideas?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?

2010-04-13 Thread Willard Korfhage
These are all good reasons to switch back to letting ZFS handle it. I did put 
about 600GB of data on the pool as configured with Raid 6 on the card, verified 
the data, and scrubbed it a couple time in the process and there's no problems, 
so it appears that the firmware upgrade fixed my problems. However, I'm going 
to switch it back to passthrough disks, remake the pool and try it again.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?

2010-04-12 Thread Willard Korfhage
I upgraded to the latest firmware. When I rebooted the machine, the pool was 
back, with no errors. I was surprised.

I will work with it more, and see if it stays good. I've done a scrub, so now 
I'll put more data on it and stress it some more.

If the firmware upgrade fixed everything, then I've got  a question about which 
I am better off doing: keep it as-is, with the raid card providing redundancy, 
or turn it all back into pass-through drives and let ZFS handle it, making the 
Areca card just a really expensive way of getting a bunch of SATA interfaces?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?

2010-04-12 Thread Willard Korfhage
I was wondering if the controller itself has problems. My card's firmware is 
version 1.42, and the firmware on the website is up to 1.48.

I see the firmware released in last September says

Fix Opensolaris+ZFS to add device to mirror set in JBOD or passthrough mode

and

Fix SATA raid controller seagate HDD error handling

I'm not using mirroring, but I am using seagate drives. Looks like I should do 
a firmware upgrade
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?

2010-04-12 Thread Willard Korfhage
Just a message 7 hours earlier about an IRQ being shared by drivers with 
different interrupt levels might result in reduced performance.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?

2010-04-11 Thread Willard Korfhage
IT is a Corsair 650W modular power supply, with 2 or 3 disks per cable. 
However, the Areca card is not reporting any errors, so I think power to the 
disks is unlikely to be a problem.

Here's what is in /var/adm/messages

Apr 11 22:37:41 fs9 fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-GH, 
TYPE: Fault, VER: 1, SEVERITY: Major
Apr 11 22:37:41 fs9 EVENT-TIME: Sun Apr 11 22:37:41 CDT 2010
Apr 11 22:37:41 fs9 PLATFORM: System-Product-Name, CSN: System-Serial-Number, 
HOSTNAME: fs9
Apr 11 22:37:41 fs9 SOURCE: zfs-diagnosis, REV: 1.0
Apr 11 22:37:41 fs9 EVENT-ID: f6d2aef7-d5fc-e302-a68e-a50a91e81d2d
Apr 11 22:37:41 fs9 DESC: The number of checksum errors associated with a ZFS 
device
Apr 11 22:37:41 fs9 exceeded acceptable levels.  Refer to 
http://sun.com/msg/ZFS-8000-GH for more information.
Apr 11 22:37:41 fs9 AUTO-RESPONSE: The device has been marked as degraded.  An 
attempt
Apr 11 22:37:41 fs9 will be made to activate a hot spare if available.
Apr 11 22:37:41 fs9 IMPACT: Fault tolerance of the pool may be compromised.
Apr 11 22:37:41 fs9 REC-ACTION: Run 'zpool status -x' and replace the bad 
device.
Apr 11 22:37:42 fs9 fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-HC, 
TYPE: Error, VER: 1, SEVERITY: Major
Apr 11 22:37:42 fs9 EVENT-TIME: Sun Apr 11 22:37:42 CDT 2010
Apr 11 22:37:42 fs9 PLATFORM: System-Product-Name, CSN: System-Serial-Number, 
HOSTNAME: fs9
Apr 11 22:37:42 fs9 SOURCE: zfs-diagnosis, REV: 1.0
Apr 11 22:37:42 fs9 EVENT-ID: 89b2ef1c-c689-66a0-a7f7-d015a1b7f260
Apr 11 22:37:42 fs9 DESC: The ZFS pool has experienced currently unrecoverable 
I/O
Apr 11 22:37:42 fs9 failures.  Refer to http://sun.com/msg/ZFS-8000-HC 
for more information.
Apr 11 22:37:42 fs9 AUTO-RESPONSE: No automated response will be taken.
Apr 11 22:37:42 fs9 IMPACT: Read and write I/Os cannot be serviced.
Apr 11 22:37:42 fs9 REC-ACTION: Make sure the affected devices are connected, 
then run
Apr 11 22:37:42 fs9 'zpool clear'.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?

2010-04-11 Thread Willard Korfhage
I'm struggling to get a reliable OpenSolaris system on a file server. I'm 
running an Asus P5BV-C/4L server motherboard, 4GB ECC ram, an E3110 processor, 
and an Areca 1230 with 12 1-TB disks attached. In a previous posting, it looked 
like RAM or the power supply by be a problem, so I ended up upgrading 
everything except the raid card and the disks. I'm running OpenSolaris preview 
build 134.

I started off my setting up all the disks to be pass-through disks, and tried 
to make a raidz2 array using all the disks. It would work for a while, then 
suddenly every disk in the array would have too many errors and the system 
would fail. I don't know why the sudden failure, but eventually I gave up.

Instead, I used the Areca card to create a Raid-6 array with a hot spare, and 
created a pool directly on the 8TB disk the raid card exposed. I'll let the 
card handle the redundancy, and zfs just the file system. Disk performance is 
noticeably faster, by the way, compared to software raid.

I have been testing the system, and it suddenly failed again:

 # zpool status -v
  pool: bigraid
 state: DEGRADED
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
   see: http://www.sun.com/msg/ZFS-8000-HC
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
bigraid DEGRADED 0 0 7
  c4t0d0DEGRADED 0 034  too many errors

errors: Permanent errors have been detected in the following files:

:<0x1>
:<0x18>
bigraid:<0x3>

The raid card says the array is fine - no errors - so something is going on 
with ZFS. I'm out of ideas this point, except that build 134 might be unstable 
and I should install an earlier, more stable version. Is there anything I'm 
missing that I should check?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Diagnosing Permanent Errors

2010-04-06 Thread Willard Korfhage
Yes, I was hoping to find the serial numbers. Unfortunately, it doesn't show 
any serial numbers for the disk attached to the Areca raid card.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Diagnosing Permanent Errors

2010-04-05 Thread Willard Korfhage
Memtest didn't show any errors, but between Frank, early in the thread, saying 
that he had found memory errors that memtest didn't catch, and remove of DIMMs 
apparently fixing the problem, I too soon jumped to the conclusion it was the 
memory. Certainly there are other explanations. 

I see that I have a spare Corsair 620W power supply that I could try. It is a 
Corsair supply of some wattage in there now. If I recall properly, the steady 
state power draw is between 150 and 200 watts.

By the way, I see that now one of the disks is listed as degraded - too many 
errors. Is there a good way to identify exactly which of the disks it is?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Diagnosing Permanent Errors

2010-04-05 Thread Willard Korfhage
It certainly has symptoms that match a marginal power supply, but I measured 
the power consumption some time ago and found it comfortably within the power 
supply's capacity. I've also wondered if the RAM is fine, but there is just 
some kind of flaky interaction of the ram configuration I had with the 
motherboard.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Diagnosing Permanent Errors

2010-04-04 Thread Willard Korfhage
Looks like it was RAM. I ran memtest+ 4.00, and it found no problems. I removed 
2 of the 3 sticks of RAM, ran a backup, and had no errors. I'm running more 
extensive tests, but it looks like that was it. A new motherboard, CPU and ECC 
RAM are on the way to me now.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Diagnosing Permanent Errors

2010-04-04 Thread Willard Korfhage
Yeah, this morning I concluded I really should be running ECC ram. I sometimes 
wonder why people people don't run ECC ram more frequently. I remember a decade 
ago, when ram was much, much less dense, people fretted about alpha particles 
randomly flipping bits, but that seems to have died down.

I know, of course, there is some added expense, but browsing on Newegg, the 
additional RAM cost is pretty minimal. I see 2GB ECC sticks going for about $12 
more than similar non-ECC sticks. It's the motherboards that can handle ECC 
which are the expensive part. Now I've got to see what is a good motherboard 
for a file server.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Diagnosing Permanent Errors

2010-04-04 Thread Willard Korfhage
I would like to get some help diagnosing permanent errors on my files. The 
machine in question has 12 1TB disks connected to an Areca raid card. I 
installed OpenSolaris build 134 and according to zpool history, created a pool 
with

zpool create bigraid raidz2 c4t0d0 c4t0d1 c4t0d2 c4t0d3 c4t0d4 c4t0d5 c4t0d6 
c4t0d7 c4t1d0 c4t1d1 c4t1d2 c4t1d3

I then backed up 806G of files to the machine, and had the backup program 
verify the files. It failed. The check is continuing to run, but so far it 
found 4 files where the checksums of the backup files don't match the checksum 
of the original file. Zpool status shows problems:

 $ sudo zpool status -v
  pool: bigraid
 state: DEGRADED
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
   see: http://www.sun.com/msg/ZFS-8000-HC
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
bigraid DEGRADED 0 0   536
  raidz2-0  DEGRADED 0 0 3.14K
c4t0d0  ONLINE   0 0 0
c4t0d1  ONLINE   0 0 0
c4t0d2  ONLINE   0 0 0
c4t0d3  ONLINE   0 0 0
c4t0d4  ONLINE   0 0 0
c4t0d5  ONLINE   0 0 0
c4t0d6  ONLINE   0 0 0
c4t0d7  ONLINE   0 0 0
c4t1d0  ONLINE   0 0 0
c4t1d1  ONLINE   0 0 0
c4t1d2  ONLINE   0 0 0
c4t1d3  DEGRADED 0 0 0  too many errors

errors: Permanent errors have been detected in the following files:

:<0x18>
:<0x3a>

So, it appears that one of the disks is bad, but if one disk failed, how would 
a raidz2 pool develop permanent errors? The numbers in the CKSUM column are 
continuing to grow, but is that because the backup verification is tickling the 
errors as it runs?

Previous postings on permanent errors said to look at fmdump -eV, but that has 
437543 lines, and I don't really know how to interpret what I see. I did check 
the vdev_path with " fmdump -eV | grep  vdev_path | sort | uniq -c" to see if 
it was only certain disks, but every disk in the array is listed in the file, 
albeit with different frequencies:

2189vdev_path = /dev/dsk/c4t0d0s0
1077vdev_path = /dev/dsk/c4t0d1s0
1077vdev_path = /dev/dsk/c4t0d2s0
1097vdev_path = /dev/dsk/c4t0d3s0
  25vdev_path = /dev/dsk/c4t0d4s0
  25vdev_path = /dev/dsk/c4t0d5s0
  20vdev_path = /dev/dsk/c4t0d6s0
1072vdev_path = /dev/dsk/c4t0d7s0
1092vdev_path = /dev/dsk/c4t1d0s0
vdev_path = /dev/dsk/c4t1d1s0
2221vdev_path = /dev/dsk/c4t1d2s0
1149vdev_path = /dev/dsk/c4t1d3s0

What should I make of this? All the disks are bad? That seems unlikely. I found 
another thread

http://opensolaris.org/jive/thread.jspa?messageID=399988

where it finally came down to bad memory, so I'll test that. Any other 
suggestions?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss