I'm having a serious problem with a customer running a T2000 with ZFS 
configured as raidz1 with 4 disks, no spare.
The machine is mostly a cyrus imap server and web application server to run the 
ajax app to email.
Yesterday we had a heavy slow down.
Tomcat runs smoothly, but the imap access is very slow, also through a direct 
imap client runnining on LAN PCs.
We figured out that the 4th disk was signaling hardware errors on 
/var/adm/messages, but no error could be seen on zpool.
A technician went there to substitute the disk.
My idea was to add the disk to the zpool, issue a replace command so to remove 
the failing disk.
The technician by mistake did something different: he created a spare device 
containing both the failing disk and the new one.
So at the moment I have the 3 original disks, and one spare containing the new 
one and the falining one.
Today I turned offline the failing disk, so the spare device is using the new 
disk.
Then I turned off the T2000, removed physically the failing disk, and turned on 
everything.
Now I have this output:

-bash-3.00# zpool status
  pool: dskmail
 state: DEGRADED
status: One or more devices has been taken offline by the adminstrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using 'zpool online' or replace the device with
        'zpool replace'.
 scrub: resilver completed with 0 errors on Wed Apr 16 08:38:54 2008
config:

        NAME             STATE     READ WRITE CKSUM
        dskmail          DEGRADED     0     0     0
          raidz1         DEGRADED     0     0     0
            c3t9d0s0     ONLINE       0     0     0
            c3t10d0s0    ONLINE       0     0     0
            c3t12d0s0    ONLINE       0     0     0
            spare        DEGRADED     0     0     0
              c3t13d0s0  OFFLINE      0     0     0
              c3t14d0s0  ONLINE       0     0     0
        spares
          c3t14d0s0      INUSE     currently in use

errors: No known data errors


As you can see, the t13 disk is offline and physically removed.
The machine is still very slow.
I want to remove the t13 disk from the zpool, but I can't.
My question is:

- How do I put the t14 disk as it should be? (added as no spare)
- Can I simply remove the spare device while the machine is running without any 
risk?
- What will happen if I then add the t14 device to the 3 disks? Will it start a 
new sync?

What I think is that the t14 should already contain raid data, as sync has 
already terminated, while inside the spare.
So, adding it as no spare should not reissue the sync process again, if not for 
the few data in between.
Am I wrong?

Thanx for any help, really.
Gabriele Bulfon
 
 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to