I'm not sure if I'm getting the concept or not, but see if this is what you mean.
I have several parallel environments already to handle different software application needs. One of the CFD environments uses hpmpi for it's processing (over IB), so we have an hpmpi PE set up for it. If I were that PE to something like hpmpi-ib1 and hpmpi-ib2, set up certain machines to use either hpmpi-ib1 or hpmpi-ib2, then when I submit my job: qsub -q blah -pe "hpmpi-ib*" job.sh it'll just pick one? Now that I re-read what I wrote, it's pretty much the same thing you and William are saying. Is there something else I forgot here? > Subject: Re: [gridengine users] Creating an infiniband complex > From: [email protected] > Date: Wed, 18 Feb 2015 15:06:17 +0100 > CC: [email protected] > To: [email protected] > > Am 18.02.2015 um 13:13 schrieb Kevin Taylor <[email protected]>: > > > > > > I have several groups of machines that have infiniband on them and due to > > history and physical locations, these groups of machines have individual > > infiniband domains. > > > > What I've done right now (not in production) is create a boolean complex > > for 'ib' and identify all of the nodes that contain infiniband. I've also > > created a string complex called 'ibdomain' that has a name to uniquely > > identify which systems connect to each other with IB. > > > > Is there a way that a user could just ask for 'ib' when submitting a > > parallel job (I don't care where it goes as long as it has infiniband), and > > have the grid engine tell the job the value of 'ibdomain'? Or keep the job > > within systems on the same ibdomain? > > It should work to request one of the domains by specifying its name as a > request with a specific string `qsub -l ibdomain=section2 ...`. But this may > not be what you are looking for as you can't use a wildcard here (at least > not with the effect to stay inside one domain). > > Instead of a complex which receives a certain number, it's easier to define > one PE per domain with a suffix and then request like `qsub -pe "ib*" 16 ...` > for any of the IB domains. Once a PE is selected, only slots belonging to > this PE will be selected for this job. > > You can attach the PEs in a single queue by defining a list of PEs for > individual nodes or per hostgroup (which might shorten the line). > > $ qconf -sq all.q > ... > pe_list make,[@ibhosts1=ib1 make],[@ibhosts2=ib2 make smp] > > (the default is used only for machines not listed further more in the list, > it's not added to all automatically) > > -- Reuti
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
