Hi,

I have a 3-node(SunFire V890) VCS cluster running Solaris 10 u4
with LUNs from some Sun 6130,6140 and IBM 8100 arrays. It has been
working well. But one of the nodes started to have troubles
in running ZFS commands this Tue, 2/19. Any ZFS command, e.g., 
'zpool import' can take hours to complete. Sometimes it took 4-5 
minutes, and run it again, it can take 60 minutes. On the other 2 
nodes that share the same set of LUNs are still normal so far - 
take some 5-10 seconds or less for the same commands.  
I haven't noticed any error messages from the arrays or SAN switches
and other than the HBAs and switch ports, they are virtually identical.
(other commands like cfgadm, format,... seems normal, so I suspect
the culprit might be related to ZFS. I open a case with Sun, this route
seems take forever for this kind of issue and I haven't got any answer yet.)

The host is not down or crashed. I rebooted it once today, not sure if
it's fixed by reboot, 'zpool import' can still take minutes rather than
seconds to complete). I still need to create some test LUNs and pools
for more tests.  It seems everything is still normal except the ZFS.  
Most zfs commands also cause cpu loads well up till completed, 
as seen in vmstast,mpstat, or top.  This has been causing us troubles 
as our home grown VCS ZFS agent would consider the zpool is dead 
after some consecutive failures in probing the pool (zpool status 
takes forever to complete).

Does anyone has same problem or know what might be the cause/fix?
Thanks.

Max Holm
 
 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to