It is NOT true that queuing systems cannot do the "WIEN2k style".
We have two big clusters and run on them all three types of jobs, i) only ssh (k-parallel), ii) only mpi-parallel (no mpi) and also of mixed type. And of course the administrators configured the "sun grid engine" so that it makes sure that there are no processes running when a job finishes and eventually kill all processes of a batch job on all the assigned nodes after it has finished. It's just a matter if the system programmers are willing (or able ??) to reconfigure the queuing system... PS: If you are running mpi-parallel use setenv MPI_REMOTE 0 in $WIENROOT/parallel_options and ssh will not be used anyway. Am 05.01.2012 13:17, schrieb Laurence Marks: > As Florent said, this is a known issue with some (not all) versions ofssh, > and it is also a torque bug. What you have to do is use mpiruninstead of ssh > to launch jobs which I think you can do by setting theMPI_REMOTE/USE_REMOTE > switches. I think I posted how to do this sometime ago, so please search the > mailing list. (I am in China and canprovide more information next week when I > return if this is notenough, which it probably is not.) > N.B., in case anyone wonders with torque (PBS) you are not "supposedto" use > ssh to communicate the way Wien2k does. They are not going tomove on this so > this is "WIen2k's fault". I've looked in to this quitea bit and there is no > solution except to avoid ssh (or live withzombie processes). Indeed, torque > has the weakness of leavingprocesses around if a code does anything more > adventurous than justrun a single mpirun -- so it goes. > On Thu, Jan 5, 2012 at 3:22 AM, Peter Blaha<pblaha at theochem.tuwien.ac.at> > wrote:> I've never done this myself, but as far as I know one can define a> > "prolog" script in all those queuing systems and this prolog script> should > ssh to all assigned nodes and kill all remaining jobs of this user.>>> Am > 05.01.2012 10:17, schrieb Florent Boucher:>>> Dear Yundi,>> this is a known > limitation of ssh and rsh that does not pass the interrupt>> signal to the > remote host.>> Under LSF I had in the past a solution. It was a specific > rshlsf for doing>> this.>> Actually I use either SGE or PBS on two > different cluster and the problem>> exists.>> You will see that are not > even able to suspend a running job.>> If some one has a solution, I will > also appreciate.>> Regards>> Florent>>>> Le 04/01/2012 21:57, Yundi Quan a > ?crit :>>>>>> I'm working on a cluster using torque queue system. I can > directly ssh to>>> any nodes without using password. When I use qdel( or > canceljob) j obid to>>> terminate a running job, the>>> job will be terminated in the queue system. However, when I ssh to the>>> nodes, the job are still running. Does anyone know how to avoid this?>>>>>>>>>>>> _______________________________________________>>> Wien mailing list>>> Wien at zeus.theochem.tuwien.ac.at>>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien>>>>>>>> -->> ------------------------------------------------------------------------->> | Florent BOUCHER |>> |>> | Institut des Mat?riaux Jean Rouxel |Mailto:Florent.Boucher at cnrs-imn.fr>> |>> | 2, rue de la Houssini?re | Phone: (33) 2 40 37 39 24>> |>> | BP 32229 | Fax: (33) 2 40 37 39 95>> |>> | 44322 NANTES CEDEX 3 (FRANCE) |http://www.cnrs-imn.fr>> |>> ------------------------------------------------------------------------->>>>>>>> _______________________________________________>> Wien mailing list>> Wien at zeus.theoc hem.tuwien.ac.at>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien>>> -->> P.Blaha> --------------------------------------------------------------------------> Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna> Phone: +43-1-58801-165300 FAX: +43-1-58801-165982> Email: blaha at theochem.tuwien.ac.at WWW:> http://info.tuwien.ac.at/theochem/> -------------------------------------------------------------------------->>> _______________________________________________> Wien mailing list> Wien at zeus.theochem.tuwien.ac.at> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien > > > -- Professor Laurence MarksDepartment of Materials Science and > EngineeringNorthwestern Universitywww.numis.northwestern.edu > 1-847-491-3996"Research is to see what everybody else has seen, and to think > whatnobody else has thought"Albert > Szent-Gyorgi_______________________________________________Wien mailing > listWien at > zeus.theochem.tuwien.ac.athttp://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien -- P.Blaha -------------------------------------------------------------------------- Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna Phone: +43-1-58801-165300 FAX: +43-1-58801-165982 Email: blaha at theochem.tuwien.ac.at WWW: http://info.tuwien.ac.at/theochem/ --------------------------------------------------------------------------