Hello Paul,

thanks for the kind words. Indeed you may.
We predetermined energy rates and performance for a set of workloads for each of our nodes. The idea is to evaluate how such knowledge might benefit the scheduler when optimizing for certain criterias.
(which includes sometimes overruling the scheduler)
Multi-node jobs have been ignored so far, our workloads are strictly allocated to a single node.

Regarding my previous request for access to the opt-struct:
This seems not possible or more complicated than expected.
In order to resolve the struct-reference i'd need to gain acess to either srun itself or link to srun's library, which i haven't found a proper way to do yet.
(the Makefiles are quite... big.)

If anyone has ideas regarding this problem, please let me know.

Thanks,

M. Wagner

On 2017-04-20 23:14, Van Der Mark, Paul wrote:
Hello M. Wagner,

That looks like a pretty clever solution. May I ask why you are trying
to circumvent the schedulers choice for a node and what will you do if
a job spawns multiple nodes?

Best,
Paul

On Wed, 2017-04-19 at 15:37 -0700, maviko.wag...@fau.de wrote:
Hello everyone,

i managed to do so. Eureka.

My point of "intrusion" is the scheduler plugin, more precisely the 
newalloc-submethod.
In case anyone is interested/faces the same problems:

struct node_record *oldnode= (find that somehow)
same for the new node.

Then repeat for all old nodes:
excise_node_from_job(job_ptr, oldnode);
make_node_idle(oldnode, job_ptr);

char *new_name = (find somehow again);
xfree(job_ptr->details->req_nodes);
job_ptr->details->req_nodes= 
xstrdup(new_name);      FREE_NULL_BITMAP(job_ptr->details-
>req_node_bitmap);   bitstr_t* 
newmap;
if(!node_name2bitmap(new_name, false, &newmap)) 
job_ptr->details->req_node_bitmap = newmap;

select_nodes(job_ptr, false, NULL, (list of oldnode-names to be 
excluded), NULL);
I use a tweaked copy of select_nodes that doesnt call the 
newalloc-submethod again to prevent unnecessary recursion.

However, i still am looking for a way to find/edit submitted env-
vars 
(srun --export=<env-vars>), i can't seem to access the opt_t opt
struct 
in which it should be saved. Any input on that?

Cheers, M. Wagner

On 2017-04-18 20:47, maviko.wag...@fau.de wrote:
>
> Still looking for help with this.
>
> On 2017-04-13 15:57, maviko.wag...@fau.de wrote:
> >
> > Hello everyone,
> >
> > I'm looking for a way to properly catch an incoming
> > srun/salloc/etc.
> > command, check which node its supposed to run on, and possibly
> > redirect it to some other node of my choosing.
> > All of this from within the code.
> > My current point of invasion is within the scheduler plugin.
> >
> > My approach so far:
> >
> > From within the slurm_sched_p_newalloc( struct job_record
> > *job_ptr
> > )-method in the scheduler-wrapper i'm doing the following:
> >
> > //get current target
> > char *alloc_node = bitmap2node_name(job_ptr->node_bitmap);
> >
> > [logic to determine wether a nodechange is due]
> >
> > struct node_record *oldnode = find_node_record(alloc_node);
> >       if (oldnode) {
> >               struct node_record *newnode =
> > find_node_record(newnode_name);
> >               if (newnode) {
> >                       oldnode->run_job_cnt--;
> >                       oldnode->no_share_job_cnt--;
> >                       newnode->run_job_cnt++;
> >                       newnode->no_share_job_cnt++;
> >                       bitstr_t *t_node_bitmap;
> >                       if (!node_name2bitmap(newnode_name, true,
> > &t_node_bitmap)) {
> >                               job_ptr->node_bitmap =
> > bit_copy(t_node_bitmap);
> >                               job_ptr->nodes = strdup(newnode_name);
> >                       }
> >               }
> >       }
> >
> > However this
> > a) only works rarely, depending on if amount of cpus requested
> > etc.
> > actually match,
> > b) doesn't properly set the states of the nodes (which i could do
> > manually aswell, sure).
> >
> > But this is neither elegant nor properly working most of the time
> > (to
> > no surprise).
> > Therefore i'd like to get some starting points on how to properly
> > use
> > the internal rpc-system etc.
> >
> > Any help? Thanks in advance.
> >
> > Regards, M. Wagner

Reply via email to