I can not reproduce the problem. It seems that there is a buffer
overflow in PMI2 client of MPICH.
在 2013-03-31日的 18:08 -0600,Christoph Sprenger写道:
> sorry... here is a the complete trace:
>
> *** buffer overflow detected ***:
> /vol/bob/check/csprenger/linux64/opt/bin/mpi_hello_world terminate
to follow up:
after fixing an issue in the source of mpich2 simple2pmi.c ( which
overruns a snprintf buffer ), the spawn interface started to work.
however other things started to break ( eg the singleton mode, when no
srun was provided ).
Pavan Balaji directed me to these steps, which works
sorry... here is a the complete trace:
*** buffer overflow detected ***:
/vol/bob/check/csprenger/linux64/opt/bin/mpi_hello_world terminated
=== Backtrace: =
/lib/libc.so.6(__fortify_fail+0x37)[0x7f194dc19217]
/lib/libc.so.6(+0xfe0d0)[0x7f194dc180d0]
/lib/libc.so.6(+0xfd7cb)[0x7f194d
could you please paste the complete output/error messages?
在 2013-03-28四的 14:59 -0600,Christoph Sprenger写道:
> pich.so.10(PMI2_Init+0x7ff)[0x7f5daff7806f]
> /tech/home/csprenger/mpich-3.0.2_SLURM//lib/libmpich.so.10(MPID_Init
> +0xac)[0x7f5daff371ac]
> /tech/home/csprenger/mpich-3.0.2_SLURM//lib/l
Hi Yiannis,
thanks for your reply, but unfortunately i still seem to having issues.
i've rebuilt mpich2-3.0.2
./configure --with-slurm=/local1/slurm-2.5.4_INSTALL/ --with-pmi=pmi2
--enable-pmiport --prefix=/local1/mpich-3.0.2_SLURM/ --enable-shared
--enable-cxx ;
now I'm crashing right away i
Hi Christoph,
you need to make use of PMI2 version of slurm to test MPI_comm_spawn
primitive of mpich2.
In more detail, you have to rebuilt your mpich2 adding the following
flags on your configure:
--enable-pmiport --with-pmi=pmi2--with-slurm=$YOUR_SLURM
and when you run jobs with slurm y