Re: [zfs-discuss] ZFS pegging the system

2009-07-17 Thread Scott Laird
Have each node record results locally, and then merge pair-wise until
a single node is left with the final results?  If you can do merges
that way while reducing the size of the result set, then that's
probably going to be the most scalable way to generate overall
results.

On Thu, Jul 16, 2009 at 10:51 AM, Jeff Hafermanj...@haferman.com wrote:

 We have a SGE array task that we wish to run with elements 1-7.
 Each task generates output and takes roughly 20 seconds to 4 minutes
 of CPU time.  We're doing them on a machine with about 144 8-core nodes,
 and we've divvied the job up to do about 500 at a time.

 So, we have 500 jobs at a time writing to the same ZFS partition.

 What is the best way to collect the results of the task? Currently we
 are having each task write to STDOUT and then are combining the
 results. This nails our ZFS partition to the wall and kills
 performance for other users of the system.  We tried setting up a
 MySQL server to receive the results, but it couldn't take 1000
 simultaneous inbound connections.

 Jeff

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS pegging the system

2009-07-17 Thread Louis-Frédéric Feuillette
On Thu, 2009-07-16 at 10:51 -0700, Jeff Haferman wrote:
 We have a SGE array task that we wish to run with elements 1-7.  
 Each task generates output and takes roughly 20 seconds to 4 minutes  
 of CPU time.  We're doing them on a machine with about 144 8-core nodes,
 and we've divvied the job up to do about 500 at a time.
 
 So, we have 500 jobs at a time writing to the same ZFS partition.

Sorry no answers, just some question that first came to mind.

Where is your bottleneck?  Is it drive I/O or Network?

Are all nodes accessing/writing via NFS?  Is this a NFS sync issue?
Might a SSD ZIL help?
-- 
Louis-Frédéric Feuillette jeb...@gmail.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS pegging the system

2009-07-16 Thread Jeff Haferman

We have a SGE array task that we wish to run with elements 1-7.  
Each task generates output and takes roughly 20 seconds to 4 minutes  
of CPU time.  We're doing them on a machine with about 144 8-core nodes,
and we've divvied the job up to do about 500 at a time.

So, we have 500 jobs at a time writing to the same ZFS partition.

What is the best way to collect the results of the task? Currently we  
are having each task write to STDOUT and then are combining the  
results. This nails our ZFS partition to the wall and kills  
performance for other users of the system.  We tried setting up a  
MySQL server to receive the results, but it couldn't take 1000  
simultaneous inbound connections.

Jeff

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss