Hi Ralph,
Yes, Eclipse is currently being actively developed. To my understanding
https://www.eclipse.org/ptp/
is also active. I did drop a question on Eclipse forum, but I got no
response. https://www.eclipse.org/forums/index.php/t/869298/
I am still looking for answers. Hopefully I will find an
If it isn’t too much trouble, it would be good to confirm that it remains
broken. I strongly suspect it is based on Moe’s comments.
Obviously, other people are making this work. For Intel MPI, all you do is
point it at libpmi and they can run. However, they do explicitly dlopen it in
their code
$ srun --version
slurm 2.6.6-VENDOR_PROVIDED
$ srun --mpi=pmi2 -n 1 ~/hw
I am 0 / 1
$ srun -n 1 ~/hw
/csc/home1/gouaillardet/hw: symbol lookup error:
/usr/lib64/slurm/auth_munge.so: undefined symbol: slurm_verbose
srun: error: slurm_receive_msg: Zero Bytes were transmitted or received
srun: error
Out of curiosity - how are you testing these? I have more current versions of
Slurm and would like to test the observations there.
> On Dec 1, 2014, at 7:49 PM, Gilles Gouaillardet
> wrote:
>
> I d like to make a step back ...
>
> i previously tested with slurm 2.6.0, and it complained about
I d like to make a step back ...
i previously tested with slurm 2.6.0, and it complained about the
slurm_verbose symbol that is defined in libslurm.so
so with slurm 2.6.0, RTLD_GLOBAL or relinking is ok
now i tested with slurm 2.6.6 and it complains about the
slurm_auth_get_arg_desc symbol, and t
Another option is to simply add the -lslurm -lauth flags to the pmix/s1
component as this is the only place that requires it, and it won’t hurt
anything to do so.
> On Dec 1, 2014, at 6:03 PM, Gilles Gouaillardet
> wrote:
>
> Jeff,
>
> FWIW, you can read my analysis of what is going wrong a
Jeff,
FWIW, you can read my analysis of what is going wrong at
http://www.open-mpi.org/community/lists/pmix-devel/2014/11/0293.php
bottom line, i agree this is a slurm issue (slurm plugin should depend
on libslurm, but they do not, yet)
a possible workaround would be to make the pmi component a
Ok, if the problem is moot, great.
(sidenote: this is moot, so ignore this if you want: with this explanation, I'm
still not sure how RTLD_GLOBAL fixes the issue)
On Dec 1, 2014, at 5:15 PM, Ralph Castain wrote:
> Easy enough to explain. We link libpmi into the pmix/s1 component. This
> libr
Easy enough to explain. We link libpmi into the pmix/s1 component. This library
is missing the linkage to libslurm that contains the linkage to libauth where
munge resides. So when we call a PMI function, libpmi references a call to
munge for authentication and hits an “unresolved symbol” error.
On Dec 1, 2014, at 5:07 PM, Ralph Castain wrote:
> FWIW: It’s Slurm’s pmi-1 library that isn’t linked correctly against its
> dependencies (the pmi-2 one is correct). Moe is aware of the problem and
> fixing it on their side. This won’t help existing installations until they
> upgrade, but I
FWIW: It’s Slurm’s pmi-1 library that isn’t linked correctly against its
dependencies (the pmi-2 one is correct). Moe is aware of the problem and fixing
it on their side. This won’t help existing installations until they upgrade,
but I tend to agree with Jeff about not fixing other people’s prob
On Dec 1, 2014, at 4:07 PM, Howard Pritchard wrote:
> There has been some discussion of end case situations with use of dlopen
> in the ompi mca framework that can lead to unresolved symbols when
> subsequent shared libraries are dlopen'd that might needs symbols from
> a library that had been op
Hi ompi developers,
If you always configure ompi with --disable-dlopen you can delete this
message now.
There has been some discussion of end case situations with use of dlopen
in the ompi mca framework that can lead to unresolved symbols when
subsequent shared libraries are dlopen'd that might n
Looks like this should be fixed in my PR #101 - could you please review it?
Thanks
Ralph
> On Nov 26, 2014, at 8:14 PM, Ralph Castain wrote:
>
> Aha - I see what happened. I have that param set to false in my default mca
> param file. If I set it to true on the cmd line, then I run without
>
Paul --
Sorry for the delay -- SC and the US Thanksgiving holiday last week got in the
way of responding to this properly.
I talked with Dave Goodell about this issue a bunch today.
Going back to the original email in this thread
(http://www.open-mpi.org/community/lists/devel/2014/10/16064.p
Marc,
i am not aware of any mpi implementation in which mpirun does the job
allocation.
instead, mpirun gets job info from the batch manager (e.g. number of nodes)
so the job can be launched seamlessly and be properly killed in case of
a job abort
(bkill or equivalent)
Cheers,
Gilles
On 2014/1
HI,
sorry for the late reply - I've been traveling with limited email
access. I think you can leave this issue be. I think I was hoping for a
way to just launch mpirun and have it create the allocation by itself.
It's not super important right now, more something I was wondering
about.
Thank
17 matches
Mail list logo