Am 16.01.2014 um 09:54 schrieb Reuti:

> Am 16.01.2014 um 05:24 schrieb [email protected]:
> 
>> Sorry - you're correct, I meant  <-l nodes=1:ppn=[count]> .  :-)
>> 
>> Hmmm...we've had some requests from clients specifically to support SGE, but 
>> this is a pretty key part of our functionality.  Currently we can submit, 
>> but without the way to specify cores, the clients won't get the timing 
>> results they expect at all.  Out of curiosity, does anyone have any good 
>> references for *why* this isn't the paradigm
> 
> I would say: the idea was, that the admin of the cluster should provide all 
> the necessary setups and scripts to achieve a tight integration of a parallel 
> application into the cluster. This is nothing the end user of a cluster 
> should deal with. AFAIK there is no "start_proc_args" or "control_slaves" in 
> Torque to reformat the generated list of machines or catch a plain 
> "rsh"/"ssh" call of a user's application.
> 
> 10 years ago LAM/MPI or MPICH(1) weren't running out of the box in a cluster 
> in a tightly integrated way. And there the PE's definition was a great help 
> (still for other's like Linda or IBM's Platform MPI). Sure, this changed over 
> time as nowadays Open MPI and MPICH2/3 are supporting SGE's `qrsh` and 
> Torque's "TM" (Task Manager) directly.
> 
> Another reason may have been, to limit certain applications (i.e. PEs) to a 
> subset of nodes and how many slots are used for a particular applications. 
> This is nothing what can be controlled in Torque I think.
> 
> On the other hand: if you have an application which needs 4 cores on every 
> machine (i.e. "allocation_rule 4" resp. -l nodes=5:ppn=4), how can you 
> prevent a user from requesting "-l nodes=2:ppn=4+6:ppn=2" in Torque for a 20 
> core job?

NB: I just recall that once I faced the issue in Torque where "-l 
nodes=5:ppn=4" gave me 8 cores on a machine, and it was somewhere a 
(clusterwide) setting in Torque/Maui to allow bunches of 4 to be allocated more 
than once on a machine what we didn't want (in SGE it's only giving you 4 once 
on each machine). Different queuing system having different features - 
different constraints.

-- Reuti
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to