Re: [OMPI users] Error: "all nodes which are allocated for this job are already filled"

2015-05-26 Thread Mike Dubman
btw, what is a rationale to run in chroot env? is it dockers-like env? does "ibv_devinfo -v" works for you from chroot env? On Tue, May 26, 2015 at 7:08 AM, Rahul Yadav wrote: > Yes Ralph, MXM cards are on the node. Command runs fine if I run it out of > the chroot

Re: [OMPI users] Error: "all nodes which are allocated for this job are already filled"

2015-05-26 Thread Rahul Yadav
Yes Ralph, MXM cards are on the node. Command runs fine if I run it out of the chroot environment. Thanks Rahul On Mon, May 25, 2015 at 9:03 PM, Ralph Castain wrote: > Well, it isn’t finding any MXM cards on NAE27 - do you have any there? > > You can’t use yalla without MXM

Re: [OMPI users] Error: "all nodes which are allocated for this job are already filled"

2015-05-26 Thread Ralph Castain
Well, it isn’t finding any MXM cards on NAE27 - do you have any there? You can’t use yalla without MXM cards on all nodes > On May 25, 2015, at 8:51 PM, Rahul Yadav wrote: > > We were able to solve ssh problem. > > But now MPI is not able to use component yalla. We are

Re: [OMPI users] Error: "all nodes which are allocated for this job are already filled"

2015-05-26 Thread Gilles Gouaillardet
Rahul, per the logs, it seems the /sys pseudo filesystem is not mounted in your chroot. at first, can you make sure this is mounted and try again ? Cheers, Gilles On 5/26/2015 12:51 PM, Rahul Yadav wrote: We were able to solve ssh problem. But now MPI is not able to use component yalla.

Re: [OMPI users] Error: "all nodes which are allocated for this job are already filled"

2015-05-26 Thread Rahul Yadav
We were able to solve ssh problem. But now MPI is not able to use component yalla. We are running following command mpirun --allow-run-as-root --mca pml yalla -n 1 --hostfile /root/host1 /root/app2 : -n 1 --hostfile /root/host2 /root/backend command is run in chroot environment on JARVICENAE27

Re: [OMPI users] Error: "all nodes which are allocated for this job are already filled"

2015-05-13 Thread Ralph Castain
Okay, so we see two nodes have been allocated: 1. JARVICENAE27 - appears to be the node where mpirun is running 2. 10.3.0.176 Does that match what you expected? If you cannot ssh (without a password) between machines, then we will not be able to run. > On May 13, 2015, at 12:21 AM, Rahul

Re: [OMPI users] Error: "all nodes which are allocated for this job are already filled"

2015-05-13 Thread Rahul Yadav
I get following output with verbose [JARVICENAE27:00654] mca: base: components_register: registering ras components [JARVICENAE27:00654] mca: base: components_register: found loaded component loadleveler [JARVICENAE27:00654] mca: base: components_register: component loadleveler register function

Re: [OMPI users] Error: "all nodes which are allocated for this job are already filled"

2015-05-07 Thread Ralph Castain
Try adding —mca ras_base_verbose 10 to your cmd line and let’s see what it thinks it is doing. Which OMPI version are you using - master? > On May 6, 2015, at 11:24 PM, Rahul Yadav wrote: > > Hi, > > We have been trying to run MPI jobs (consisting of two different

[OMPI users] Error: "all nodes which are allocated for this job are already filled"

2015-05-07 Thread Rahul Yadav
Hi, We have been trying to run MPI jobs (consisting of two different binaries, one each ) in two nodes, using hostfile option as following mpirun --allow-run-as-root --mca pml yalla -n 1 --hostfile /root/host1 /root/app2 : -n 1 --hostfile /root/host2 /root/backend We are doing this in chroot