On Mon, Mar 8, 2010 at 2:00 PM, Chris Dunbar <cdun...@earthside.net> wrote:

> Hello,
>
> I just found this list and am very excited that you all are here! I have a
> homemade ZFS server that serves as our poor man's Thumper (we named it
> thumpthis) and provides primarily NFS shares for our VMware environment. As
> is often the case, the server has developed a hardware problem mere days
> before I am ready to go live with a new replacement server (thumpthat). At
> first the problem appeared to be a bad drive, but now I am not so sure. I
> would like to sanity check my thought process with this list and see if
> anybody has some different ideas. Here is a quick timeline of the trouble:
>
> 1. I noticed the following when running a routine zpool status:
>
> <snip>
>          mirror    DEGRADED     0     0     0
>            c3t2d0  ONLINE       0     0     0
>            c3t3d0  REMOVED      0  368K     0
> </snip>
>
> 2. I determined which drive appeared to be offline by watching drive lights
> and then rebooted the server.
>
> 3. Initially the drive appeared to be fine and ZFS picked it backup and
> resilvered the mirror. About 30 minutes later I noticed that the same drive
> was again marked REMOVED.
>
> 4. I shut the server down and replaced the drives with a new, larger disk.
>
> 5. I ran zpool replace tank c3t3d0 and it happily went to work on the
> replacement drive. A few hours later the resilver was complete and all
> seemed well.
>
> 6. The next day, about 12 hours after installing the new drive I found the
> same error message (here's the whole pool):
>
> config:
>
>        NAME        STATE     READ WRITE CKSUM
>        tank        DEGRADED     0     0     0
>          mirror    ONLINE       0     0     0
>            c3t0d0  ONLINE       0     0     0
>            c3t1d0  ONLINE       0     0     0
>          mirror    DEGRADED     0     0     0
>            c3t2d0  ONLINE       0     0     0
>            c3t3d0  REMOVED      0  370K     0
>          mirror    ONLINE       0     0     0
>            c4t0d0  ONLINE       0     0     0
>            c4t1d0  ONLINE       0     0     0
>          mirror    ONLINE       0     0     0
>            c4t2d0  ONLINE       0     0     0
>            c4t3d0  ONLINE       0     0     0
>
> errors: No known data errors
>
> This is where I am now. Either my new hard drive is bad (not impossible) or
> I am looking at some other hardware failure, possibly the AOC-SAT2-MV8
> controller card. I have a spare controller card (same make and model
> purchased at the same time we built the server) and plan to replace that
> tonight. Does that seem like the correct course of action? Are there any
> steps I can take beforehand to zero in on the problem? Any words of
> encouragement or wisdom?
>

What does `iostat -En` say ?

My suggestion is to replace the cable that's connecting the c3t3d0 disk.

IMHO, the cable is much more likely to be faulty than a single port on the
disk controller.

-- 
Giovanni Tirloni
sysdroid.com
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to