Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-03 Thread Laurence Marks
Let me expand on this slightly (in response to Ralph Castain's posting -- I had digest mode set). As currently constructed a shellscript in Wien2k (www.wien2k.at) launches a series of tasks using ($remote $remotemachine "cd $PWD;$t $ttt;rm -f .lock_$lockfile[$p]") >>.time1_$loop & where the stand

Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-03 Thread Ralph Castain
On Apr 3, 2011, at 8:14 AM, Laurence Marks wrote: > Let me expand on this slightly (in response to Ralph Castain's posting > -- I had digest mode set). As currently constructed a shellscript in > Wien2k (www.wien2k.at) launches a series of tasks using > > ($remote $remotemachine "cd $PWD;$t $ttt

Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-03 Thread Reuti
Am 03.04.2011 um 16:56 schrieb Ralph Castain: > On Apr 3, 2011, at 8:14 AM, Laurence Marks wrote: > >> Let me expand on this slightly (in response to Ralph Castain's posting >> -- I had digest mode set). As currently constructed a shellscript in >> Wien2k (www.wien2k.at) launches a series of task

Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-03 Thread Ralph Castain
On Apr 3, 2011, at 9:12 AM, Reuti wrote: > Am 03.04.2011 um 16:56 schrieb Ralph Castain: > >> On Apr 3, 2011, at 8:14 AM, Laurence Marks wrote: >> >>> Let me expand on this slightly (in response to Ralph Castain's posting >>> -- I had digest mode set). As currently constructed a shellscript in

Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-03 Thread Laurence Marks
On Sun, Apr 3, 2011 at 9:56 AM, Ralph Castain wrote: > > On Apr 3, 2011, at 8:14 AM, Laurence Marks wrote: > >> Let me expand on this slightly (in response to Ralph Castain's posting >> -- I had digest mode set). As currently constructed a shellscript in >> Wien2k (www.wien2k.at) launches a series

Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-03 Thread Ralph Castain
On Apr 3, 2011, at 9:34 AM, Laurence Marks wrote: > On Sun, Apr 3, 2011 at 9:56 AM, Ralph Castain wrote: >> >> On Apr 3, 2011, at 8:14 AM, Laurence Marks wrote: >> >>> Let me expand on this slightly (in response to Ralph Castain's posting >>> -- I had digest mode set). As currently constructed

Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-03 Thread Laurence Marks
On Sun, Apr 3, 2011 at 11:41 AM, Ralph Castain wrote: > > On Apr 3, 2011, at 9:34 AM, Laurence Marks wrote: > >> On Sun, Apr 3, 2011 at 9:56 AM, Ralph Castain wrote: >>> >>> On Apr 3, 2011, at 8:14 AM, Laurence Marks wrote: >>> Let me expand on this slightly (in response to Ralph Castain's p

Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-03 Thread Ralph Castain
On Apr 3, 2011, at 2:00 PM, Laurence Marks wrote: >>> >>> I am not using that computer. A scenario that I have come across is >>> that when a msub job is killed because it has exceeded it's Walltime >>> mpi tasks spawned by ssh may not be terminated because (so I am told) >>> Torque does not kno

Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-03 Thread David Singleton
You can prove this to yourself rather easily. Just ssh to a remote node and execute any command that lingers for awhile - say something simple like "sleep". Then kill the ssh and do a "ps" on the remote node. I guarantee that the command will have died. H ... vayu1:~ > ssh v37 sleep 60

Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-03 Thread Laurence Marks
> > It most certainly will! That mpirun on nodeB is executing under the ssh from > nodeA, so when that ssh session is killed, it automatically kills everything > run underneath it. And when mpirun dies, so does the job it was running, as > per above. > You can prove this to yourself rather easily.

Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-03 Thread Reuti
Am 03.04.2011 um 22:57 schrieb Ralph Castain: > On Apr 3, 2011, at 2:00 PM, Laurence Marks wrote: > I am not using that computer. A scenario that I have come across is that when a msub job is killed because it has exceeded it's Walltime mpi tasks spawned by ssh may not be ter

Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-03 Thread David Singleton
On 04/04/2011 12:56 AM, Ralph Castain wrote: What I still don't understand is why you are trying to do it this way. Why not just run time mpirun -v -x LD_LIBRARY_PATH -x PATH -np 2 -machinefile .machineN /home/lma712/src/Virgin_10.1/lapw1Q_mpi lapw1Q_1.def where machineN contains the names

Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-03 Thread Reuti
Am 03.04.2011 um 23:59 schrieb David Singleton: > On 04/04/2011 12:56 AM, Ralph Castain wrote: >> >> What I still don't understand is why you are trying to do it this way. Why >> not just run >> >> time mpirun -v -x LD_LIBRARY_PATH -x PATH -np 2 -machinefile .machineN >> /home/lma712/src/Virgi

Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-03 Thread Ralph Castain
Works great for me...sleep is dead every time. On Apr 3, 2011, at 3:13 PM, David Singleton wrote: > >> You can prove this to yourself rather easily. Just ssh to a remote node and >> execute any command that lingers for awhile - say something simple like >> "sleep". Then kill the ssh and do a

Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-03 Thread Ralph Castain
On Apr 3, 2011, at 3:22 PM, Reuti wrote: > Am 03.04.2011 um 22:57 schrieb Ralph Castain: > >> On Apr 3, 2011, at 2:00 PM, Laurence Marks wrote: >> > > I am not using that computer. A scenario that I have come across is > that when a msub job is killed because it has exceeded it's W

Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-03 Thread Ralph Castain
On Apr 3, 2011, at 4:08 PM, Reuti wrote: > Am 03.04.2011 um 23:59 schrieb David Singleton: > >> On 04/04/2011 12:56 AM, Ralph Castain wrote: >>> >>> What I still don't understand is why you are trying to do it this way. Why >>> not just run >>> >>> time mpirun -v -x LD_LIBRARY_PATH -x PATH -n

Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-03 Thread Laurence Marks
On Sun, Apr 3, 2011 at 5:08 PM, Reuti wrote: > Am 03.04.2011 um 23:59 schrieb David Singleton: > >> On 04/04/2011 12:56 AM, Ralph Castain wrote: >>> >>> What I still don't understand is why you are trying to do it this way. Why >>> not just run >>> >>> time mpirun -v -x LD_LIBRARY_PATH -x PATH -n

Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-03 Thread Ralph Castain
On Apr 3, 2011, at 4:37 PM, Laurence Marks wrote: > On Sun, Apr 3, 2011 at 5:08 PM, Reuti wrote: >> Am 03.04.2011 um 23:59 schrieb David Singleton: >> >>> On 04/04/2011 12:56 AM, Ralph Castain wrote: What I still don't understand is why you are trying to do it this way. Why not

Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-03 Thread Laurence Marks
Thanks. I will test this tomorrow. Many people run Wien2k with openmpi as you say, I only became aware of the issue of Wien2k (and perhaps other codes) leaving orphaned processes still running a few days ago. I also know someone who wants to run Wien2k on a system where both rsh and ssh are banned

Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-03 Thread Laurence Marks
And, before someone wonders, while Wien2k is a commercial code it is about 500 Eu for a lifetime licence so this is not the same as Vasp or Gaussian which cost $. And, I have no financial interest in the code, but like many others help make it better (semi gnu). On Sun, Apr 3, 2011 at 6:25 PM,

Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-03 Thread Ralph Castain
On Apr 3, 2011, at 5:25 PM, Laurence Marks wrote: > Thanks. I will test this tomorrow. > > Many people run Wien2k with openmpi as you say, I only became aware of > the issue of Wien2k (and perhaps other codes) leaving orphaned > processes still running a few days ago. I also know someone who wan

Re: [OMPI users] WRF run on multiple Nodes

2011-04-03 Thread Ahsan Ali
Dear David, I don't know where the machinefile is ?. I found a command *Running with Open MPI mpirun -np 168 -mca btl self,sm,openib –hostfile /home/demo/hostfile-ompi.14 -mca mpi_paffinity_alone 1 ~/WRFV3.2.1/run/wrf.exe.* for* *Dell PowerEdge M610 14-node cluster with Mellanox QDR InfiniBand Sw