Starting a new thead...

On Mon, Nov 30, 2009 at 07:59:14PM +1000, James C. McPherson wrote:
> Tru Huynh wrote:
...
> >On a supermicro board, with 3 hw raid6 vdev joined in a single pool,
> >random hangs (<weekly) which required hardware reset, nothing on the logs.
> >
> >symptoms: rpool fine, zfs status hangs on the other volume
> >all nfs shares stalled on all linux clients (local "share" -> nothing).
> >
> >Attached the requested files on 2 machines having the same issue.
> 
> 
> Two things here:
> 
> (1) your hba is a MegaRAID SAS ELP, which uses the mega_sas driver
>     not mpt, and
> (2) If you've got nothing in the logs, then you need to do more
>     investigation to work out where the problem lies.
> 
> Since your system is not using mpt, you have a different problem.

here are some more informations:

1) OS
SunOS xargos.bis.pasteur.fr 5.10 Generic_141445-09 i86pc i386 i86pc

it's only sharing though NFS v3 to linux clients running
20x CentOS-5 x86_64 2.6.18-164.6.1.el5 x86_64/i386
78x CentOS-3 x86_64/ia32e/i386

2) usual logs:
 /var/adm/messages
-> nothing

3) fmdump -ev
/var/fm/fmd/errlog is empty

4) reboot -qn hangs

5) acpi power button not trapped

6) hardware reset needed

7) not tried yet
reboot -d to force a dump

8) frequency: random < week 

it's not happening at a fixed time a day/week (crontab related)
both servers are affected, one is sharing nfs $HOME, the other a scratch space.

I zfs send receive the daily snapshots from the $HOME server to the scratch one.

9) from the #irc channel, I will keep a screen running with:
prstat
iostat
intrstat
vmstat

Thanks

Tru
-- 
Dr Tru Huynh          | http://www.pasteur.fr/recherche/unites/Binfs/
mailto:t...@pasteur.fr | tel/fax +33 1 45 68 87 37/19
Institut Pasteur, 25-28 rue du Docteur Roux, 75724 Paris CEDEX 15 France  
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to