Re: [OMPI users] [EXTERNAL] openib BTL disabled when using MPI_Init_thread

2022-02-03 Thread Pritchard Jr., Howard via users
Hi Jose, A number of things. First for recent versions of Open MPI including the 4.1.x release stream, MPI_THREAD_MULTIPLE is supported by default. However, some transport options available when using MPI_Init may not be available when requesting MPI_THREAD_MULTIPLE. You may want to let Ope

Re: [OMPI users] cuda-aware OpenMPI - high number of small asynch sent messages create invalid write

2022-02-03 Thread Alexander Stadik via users
It seems I misunderstood something regarding attaching files. And sorry for the footer I used my company Email so I get answers also when I work. here is the valgrind output https://pastebin.com/Wwvn8Pa7 here the ompi_info –all output https://pastebin.com/FW0fazZH here the gdb output https://past

Re: [OMPI users] [EXTERNAL] openib BTL disabled when using MPI_Init_thread

2022-02-03 Thread Jose E. Roman via users
Thanks. The verbose output is: [kahan01.upvnet.upv.es:29732] mca: base: components_register: registering framework btl components [kahan01.upvnet.upv.es:29732] mca: base: components_register: found loaded component self [kahan01.upvnet.upv.es:29732] mca: base: components_register: component self

Re: [OMPI users] [EXTERNAL] openib BTL disabled when using MPI_Init_thread

2022-02-03 Thread Pritchard Jr., Howard via users
Hello Jose, I suspect the issue here is that the OpenIB BTl isn't finding a connection module when you are requesting MPI_THREAD_MULTIPLE. The rdmacm connection is deselected if MPI_THREAD_MULTIPLE thread support level is being requested. If you run the test in a shell with export OMPI_MCA_btl

Re: [OMPI users] Error using rankfile to bind multiple cores on the same node for threaded OpenMPI application

2022-02-03 Thread David Perozzi via users
One disturbing thing in your note was: I'm very sorry about that. That is just wrong. Somehow I overlooked it, just because it was not were I supposed it to be. I apologize. I'm still investigating what could've gone wrong and I'm also trying Bernd's suggestion: that could indeed be an even

Re: [OMPI users] Error using rankfile to bind multiple cores on the same node for threaded OpenMPI application

2022-02-03 Thread Ralph Castain via users
I'm sure nobody has looked at the rankfile docs in many a year - nor actually tested the code for some time, especially with the newer complex chips. I can try to take a look at it locally, but it may be a few days before I get around to it. One disturbing thing in your note was: Also, on the

[OMPI users] cuda-aware OpenMPI - high number of small asynch sent messages create invalid write

2022-02-03 Thread Alexander Stadik via users
Hello whoever reads this, I am running my code using CUDA aware OpenMPI (see ompi_info –all attached). First I will explain the problem, further down I will give additional info about versions, hardware and debugging. The Problem: My application solves multiple mathematical equations on GPU via

Re: [OMPI users] Error using rankfile to bind multiple cores on the same node for threaded OpenMPI application (users Digest, Vol 4715, Issue 1)

2022-02-03 Thread Bernd Dammann via users
Hi David, On 03/02/2022 00:03 , David Perozzi wrote: Helo, I'm trying to run a code implemented with OpenMPI and OpenMP (for threading) on a large cluster that uses LSF for the job scheduling and dispatch. The problem with LSF is that it is not very straightforward to allocate and bind the r

Re: [OMPI users] Error using rankfile to bind multiple cores on the same node for threaded OpenMPI application

2022-02-03 Thread David Perozzi via users
No problem, to give detailed explanation is the least I can do! Thank you for taking your time. Yeah, to be honest I'm not completely sure I'm doing the right thing with the IDs, as I had some troubles in understanding the manpages. Maybe you can help me and we'll end up seeing that that was i

Re: [OMPI users] Error using rankfile to bind multiple cores on the same node for threaded OpenMPI application

2022-02-03 Thread Ralph Castain via users
Hmmm...okay, I found the code path that fails without an error - not one of the ones I was citing. Thanks for that detailed explanation of what you were doing! I'll add some code to the master branch to plug that hole along with the other I identified. Just an FYI: we stopped supporting "physic

Re: [OMPI users] Error using rankfile to bind multiple cores on the same node for threaded OpenMPI application

2022-02-03 Thread David Perozzi via users
Thanks for looking into that and sorry if I only included the version in use in the pastebin. I'll ask the cluster support if they could install OMPI master. I really am unfamiliar with openmpi's codebase, so I haven't looked into it and are very thanful that you could already identify possibl