Hi world,
I have a 10-disk RAID-Z2 system with 4 GB of DDR2 RAM and a 3 GHz Core 2 Duo.

It's exporting ~280 filesystems over NFS to about half a dozen machines.

Under some loads (in particular, any attempts to rsync between another
machine and this one over SSH), the machine's load average sometimes
goes insane (27+), and it appears to all be in kernel-land (as nothing
in userland reports more than 5% CPU usage, and top reports 50%+ CPU
usage).

I say 27+ because when the load spikes this high, the machine stops
responding to any meaningful commands. Console login will take a
username and password then hang forever without printing anything. SSH
login will block forever without prompting for user or password.
Machine responds to ping. NFS drops.

snv_113, this has occurred since the RAID-Z2 was created (b102).

I have no idea how to instrument this, as it doesn't appear to be
panicking, or running out of RAM (as far as I can see from the last
responses of top and prstat), and I don't know how to ask dtrace about
where I'm mostly spending my time. I read one or two guides, but I
don't follow how the output of it is meaningful.

I'm sending this to zfs-discuss as I can't replicate this problem
unless I'm doing heavy I/O on ZFS.

(Final note - this 10-disk pool is serviced by an ARC 1280ML, and
during the time the kernel is  heavily under load, zpool iostat -v is
reporting no more than 1 MB/s per disk, and almost always to the tune
of 128 KB/s.)

- Rich

-- 

The generation of random numbers is too important to be left to chance.
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to