Paul Kapinos <kapi...@rz.rwth-aachen.de> writes:

> Jeff, I would turn the question the other way around:
>
> - are there any penalties when using KNEM?

Bull should be able to comment on that -- they turn it on by default in
their proprietary OMPI derivative -- but I doubt I can get much of a
story on it.  Mellanox ship it now too, but I don't know if their
distribution defaults to using it.

I expect to use knem on hardware that's essentially the same as Mark's.
If any issues appear in production, I'll be surprised and will report
them.

> We have a couple of Really Big Nodes (128 cores) with non-huge memory
> bandwidth (because coupled of 4x standalone nodes with 4 sockets
> each).

I was hoping to have some results for just such a setup, but haven't
been able to spend any time on it this week.  If there are any
suggestions for OMPI tuning on it I'd be interested.

> So cutting the bandwidth in halves on these nodes sound like
> Very Good Thing.
>
> But otherwise we've 1500+ nodes with 2 sockets and 24GB memory only
> and we do not wanna to disturb the production on these nodes.... (and
> different MPI versions for different nodes are doofy).

Why would you need that?  Our horribly heterogeneous cluster just has a
node group-specific openmpi-mca-params.conf, and SGE parallel
environments keep jobs in specific host groups with basically the same
CPU speed and interconnect.

>
> Best
>
> Paul

Reply via email to