It is NOT true that queuing systems cannot do the "WIEN2k style".

We have two big clusters and run on them all three types of jobs,
i) only ssh (k-parallel), ii) only mpi-parallel (no mpi) and also
of mixed type.

And of course the administrators configured the "sun grid engine" so that it
makes sure that there are no processes running when a job finishes and 
eventually
kill all processes of a batch job on all the assigned nodes after it has 
finished.

It's just a matter if the system programmers are willing (or able ??) to 
reconfigure
the queuing system...

PS: If you are running mpi-parallel   use    setenv MPI_REMOTE 0 in
$WIENROOT/parallel_options and ssh will not be used anyway.

Am 05.01.2012 13:17, schrieb Laurence Marks:
> As Florent said, this is a known issue with some (not all) versions ofssh, 
> and it is also a torque bug. What you have to do is use mpiruninstead of ssh 
> to launch jobs which I think you can do by setting theMPI_REMOTE/USE_REMOTE 
> switches. I think I posted how to do this sometime ago, so please search the 
> mailing list. (I am in China and canprovide more information next week when I 
> return if this is notenough, which it probably is not.)
> N.B., in case anyone wonders with torque (PBS) you are not "supposedto" use 
> ssh to communicate the way Wien2k does. They are not going tomove on this so 
> this is "WIen2k's fault". I've looked in to this quitea bit and there is no 
> solution except to avoid ssh (or live withzombie processes). Indeed, torque 
> has the weakness of leavingprocesses around if a code does anything more 
> adventurous than justrun a single mpirun -- so it goes.
> On Thu, Jan 5, 2012 at 3:22 AM, Peter Blaha<pblaha at theochem.tuwien.ac.at>  
> wrote:>  I've never done this myself, but as far as I know one can define a>  
> "prolog" script in all those queuing systems and this prolog script>  should 
> ssh to all assigned nodes and kill all remaining jobs of this user.>>>  Am 
> 05.01.2012 10:17, schrieb Florent Boucher:>>>  Dear Yundi,>>  this is a known 
> limitation of ssh and rsh that does not pass the interrupt>>  signal to the 
> remote host.>>  Under LSF I had in the past a solution. It was a specific 
> rshlsf for doing>>  this.>>  Actually I use either SGE or PBS on two 
> different cluster and the problem>>  exists.>>  You will see that are not 
> even able to suspend a running job.>>  If some one has a solution, I will 
> also appreciate.>>  Regards>>  Florent>>>>  Le 04/01/2012 21:57, Yundi Quan a 
> ?crit :>>>>>>  I'm working on a cluster using torque queue system. I can 
> directly ssh to>>>  any nodes without using password. When I use qdel( or 
> canceljob) j
obid to>>>  terminate a running job, the>>>  job will be terminated in the 
queue system. However, when I ssh to the>>>  nodes, the job are still running. 
Does anyone know how to avoid this?>>>>>>>>>>>>  
_______________________________________________>>>  Wien mailing list>>>  Wien 
at zeus.theochem.tuwien.ac.at>>>  
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien>>>>>>>>  -->>    
------------------------------------------------------------------------->>  | 
Florent BOUCHER                    |>>    |>>  | Institut des Mat?riaux Jean 
Rouxel |Mailto:Florent.Boucher at cnrs-imn.fr>>    |>>  | 2, rue de la 
Houssini?re           | Phone: (33) 2 40 37 39 24>>    |>>  | BP 32229          
                 | Fax:   (33) 2 40 37 39 95>>    |>>  | 44322 NANTES CEDEX 3 
(FRANCE)      |http://www.cnrs-imn.fr>>    |>>    
------------------------------------------------------------------------->>>>>>>>
  _______________________________________________>>  Wien mailing list>>  Wien 
at zeus.theoc
hem.tuwien.ac.at>>  http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien>>>  
-->>                                        P.Blaha>  
-------------------------------------------------------------------------->  
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna>  Phone: 
+43-1-58801-165300             FAX: +43-1-58801-165982>  Email: blaha at 
theochem.tuwien.ac.at    WWW:>  http://info.tuwien.ac.at/theochem/>  
-------------------------------------------------------------------------->>>  
_______________________________________________>  Wien mailing list>  Wien at 
zeus.theochem.tuwien.ac.at>  
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>
>
> -- Professor Laurence MarksDepartment of Materials Science and 
> EngineeringNorthwestern Universitywww.numis.northwestern.edu 
> 1-847-491-3996"Research is to see what everybody else has seen, and to think 
> whatnobody else has thought"Albert 
> Szent-Gyorgi_______________________________________________Wien mailing 
> listWien at 
> zeus.theochem.tuwien.ac.athttp://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien

-- 

                                       P.Blaha
--------------------------------------------------------------------------
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-165300             FAX: +43-1-58801-165982
Email: blaha at theochem.tuwien.ac.at    WWW: http://info.tuwien.ac.at/theochem/
--------------------------------------------------------------------------

Reply via email to