Two questions:
- Is the software configuration on these node identical, that is is
the compiler, MPI, CUDA, IB drivers, etc the same on the C2050 and K20
nodes?
- Are you using CUDA-aware MPI? (Have you tried different MPI installs?)
If the answer is yes & yes, than I suspect this could be related
On Thu, May 22, 2014 at 5:31 PM, Thomas C. O'Connor wrote:
> Hey,
>
> Yes, everything runs fine if I work on one node with one or more GPU's. The
> crash occurs, similar to the previous mailing list post:
>
> http://comments.gmane.org/gmane.science.biology.gromacs.user/63911
>
> It crashes when we
Hey,
Yes, everything runs fine if I work on one node with one or more GPU's. The
crash occurs, similar to the previous mailing list post:
http://comments.gmane.org/gmane.science.biology.gromacs.user/63911
It crashes when we attempt to work across multiple GPU enabled nodes. This
happens when our
Hi,
Sounds like an MPI or MPI+CUDA issue. Does mdrun run if you use a
single GPU? How about two?
Btw, unless you have some rather exotic setup, you won't be able to
get much improvement from using more than three, at most four GPUs per
node - you need CPU cores to match them (and a large system t
Hey Folks,
I'm attempting to run simulations on a multi-node gpu cluster and my
simulations are crashing after flagging a open-mpi fork() warning:
*--*
*An MPI process has executed an operation involving a cal