Re: [OMPI users] Error using rankfile to bind multiple cores on the same node for threaded OpenMPI application (users Digest, Vol 4715, Issue 1)

2022-02-07 Thread David Perozzi via users
Hi Bernd, Thanks for your valuable input! Your suggested approach indeed seems like the correct one and is actually what I've always wanted to do. In the past, I've also asked our cluster support if there was this possibility, but they always suggested the following approach: export

Re: [OMPI users] Error using rankfile to bind multiple cores on the same node for threaded OpenMPI application

2022-02-03 Thread David Perozzi via users
One disturbing thing in your note was: I'm very sorry about that. That is just wrong. Somehow I overlooked it, just because it was not were I supposed it to be. I apologize. I'm still investigating what could've gone wrong and I'm also trying Bernd's suggestion: that could indeed be an even

Re: [OMPI users] Error using rankfile to bind multiple cores on the same node for threaded OpenMPI application

2022-02-03 Thread Ralph Castain via users
I'm sure nobody has looked at the rankfile docs in many a year - nor actually tested the code for some time, especially with the newer complex chips. I can try to take a look at it locally, but it may be a few days before I get around to it. One disturbing thing in your note was: Also, on the

Re: [OMPI users] Error using rankfile to bind multiple cores on the same node for threaded OpenMPI application (users Digest, Vol 4715, Issue 1)

2022-02-03 Thread Bernd Dammann via users
Hi David, On 03/02/2022 00:03 , David Perozzi wrote: Helo, I'm trying to run a code implemented with OpenMPI and OpenMP (for threading) on a large cluster that uses LSF for the job scheduling and dispatch. The problem with LSF is that it is not very straightforward to allocate and bind the

Re: [OMPI users] Error using rankfile to bind multiple cores on the same node for threaded OpenMPI application

2022-02-03 Thread David Perozzi via users
No problem, to give detailed explanation is the least I can do! Thank you for taking your time. Yeah, to be honest I'm not completely sure I'm doing the right thing with the IDs, as I had some troubles in understanding the manpages. Maybe you can help me and we'll end up seeing that that was

Re: [OMPI users] Error using rankfile to bind multiple cores on the same node for threaded OpenMPI application

2022-02-03 Thread Ralph Castain via users
Hmmm...okay, I found the code path that fails without an error - not one of the ones I was citing. Thanks for that detailed explanation of what you were doing! I'll add some code to the master branch to plug that hole along with the other I identified. Just an FYI: we stopped supporting

Re: [OMPI users] Error using rankfile to bind multiple cores on the same node for threaded OpenMPI application

2022-02-03 Thread David Perozzi via users
Thanks for looking into that and sorry if I only included the version in use in the pastebin. I'll ask the cluster support if they could install OMPI master. I really am unfamiliar with openmpi's codebase, so I haven't looked into it and are very thanful that you could already identify

Re: [OMPI users] Error using rankfile to bind multiple cores on the same node for threaded OpenMPI application

2022-02-02 Thread Ralph Castain via users
Are you willing to try this with OMPI master? Asking because it would be hard to push changes all the way back to 4.0.x every time we want to see if we fixed something. Also, few of us have any access to LSF, though I doubt that has much impact here as it sounds like the issue is in the

Re: [OMPI users] Error using rankfile to bind multiple cores on the same node for threaded OpenMPI application

2022-02-02 Thread Christoph Niethammer via users
The linked pastebin includes the following version information: [1,0]:package:Open MPI spackapps@eu-c7-042-03 Distribution [1,0]:ompi:version:full:4.0.2 [1,0]:ompi:version:repo:v4.0.2 [1,0]:ompi:version:release_date:Oct 07, 2019 [1,0]:orte:version:full:4.0.2 [1,0]:orte:version:repo:v4.0.2

Re: [OMPI users] Error using rankfile to bind multiple cores on the same node for threaded OpenMPI application

2022-02-02 Thread Ralph Castain via users
Errr...what version OMPI are you using? > On Feb 2, 2022, at 3:03 PM, David Perozzi via users > wrote: > > Helo, > > I'm trying to run a code implemented with OpenMPI and OpenMP (for threading) > on a large cluster that uses LSF for the job scheduling and dispatch. The > problem with LSF is

[OMPI users] Error using rankfile to bind multiple cores on the same node for threaded OpenMPI application

2022-02-02 Thread David Perozzi via users
Helo, I'm trying to run a code implemented with OpenMPI and OpenMP (for threading) on a large cluster that uses LSF for the job scheduling and dispatch. The problem with LSF is that it is not very straightforward to allocate and bind the right amount of threads to an MPI rank inside a single