Hi Michael, Thanks for your message. Does the installation of the library job_submit_lua.so need to have Slurm recompiled as well, ie, do I have to compile slurm with the library job_submit_lua.so to be able to add any plugin?I do not see it in the yum repo. Thanks,
*Fritz Ratnasamy* Data Scientist Information Technology The University of Chicago Booth School of Business 5807 S. Woodlawn Chicago, Illinois 60637 Phone: +(1) 773-834-4556 On Thu, Aug 26, 2021 at 9:18 AM Michael Robbert <mrobb...@mines.edu> wrote: > You need to set the following option in slurm.conf > > *JobSubmitPlugins* > > A comma delimited list of job submission plugins to be used. The specified > plugins will be executed in the order listed. These are intended to be > site-specific plugins which can be used to set default job parameters > and/or logging events. Sample plugins available in the distribution include > "all_partitions", "defaults", "logging", "lua", and "partition". For > examples of use, see the Slurm code in "src/plugins/job_submit" and > "contribs/lua/job_submit*.lua" then modify the code to satisfy your needs. > Slurm can be configured to use multiple job_submit plugins if desired, > however the lua plugin will only execute one lua script named > "job_submit.lua" located in the default script directory (typically the > subdirectory "etc" of the installation directory). No job submission > plugins are used by default. > > > > > > Then as this documentation states, put the job_submit.lua into your script > directory. Mine is in /etc/slurm/. You may want to make sure that you have > the job_submit_lua.so library installed with your build of Slurm. I agree > that finding complete documentation for this feature is a little difficult. > > > > Mike > > > > *From: *slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of > Ratnasamy, Fritz <fritz.ratnas...@chicagobooth.edu> > *Date: *Wednesday, August 25, 2021 at 23:13 > *To: *Slurm User Community List <slurm-users@lists.schedmd.com> > *Subject: *Re: [slurm-users] EXTERNAL-Re: [External] scancel gpu jobs > when gpu is not requested > > Hi Michael, > > Thanks for your message. Yes I was able to get all interactive sessions > killed quickly when trying other partitions and deactivating the prolog. I > read your example and I understand how it could possibly work (in the ex., > maybe instead of looking if the gpu model is passed, we could look at the > number of gpu passed?), but where do i set up that function and where do i > call it? > Thanks, > > *Fritz Ratnasamy* > > Data Scientist > > Information Technology > > The University of Chicago > > Booth School of Business > > 5807 S. Woodlawn > > Chicago, Illinois 60637 > > Phone: +(1) 773-834-4556 > > > > > > On Wed, Aug 25, 2021 at 9:54 AM Michael Robbert <mrobb...@mines.edu> > wrote: > > I doubt that it is a problem with your script and suspect that there is > some weird interaction with scancel on interactive jobs. If you wanted to > get to the bottom of that I’d suggest disabling the prolog and test by > manually cancelling some interactive jobs. > > Another suggestion is to try a completely different approach to solve your > problem. Why wait until the job starts to do the check? You can use a > submit filter and it will alert the user as soon as they try to submit. > That will prevent them from potentially having to wait in the queue if the > cluster is busy and gets around having to cancel a running job. There is a > description and simple example at the bottom of this page: > https://slurm.schedmd.com/resource_limits.html > <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fresource_limits.html&data=04%7C01%7Cmrobbert%40mines.edu%7C577fad20cd024e8f8d5a08d96850336c%7C997209e009b346239a4d76afa44a675c%7C0%7C0%7C637655515944014175%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=vnuQFWtAkvixWlJaLCVa%2Bcmt0Zt97RCWhStXO1VLoss%3D&reserved=0> > > > > Mike > > > > *From: *slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of > Ratnasamy, Fritz <fritz.ratnas...@chicagobooth.edu> > *Date: *Tuesday, August 24, 2021 at 21:00 > *To: *slurm-users@lists.schedmd.com <slurm-users@lists.schedmd.com> > *Subject: *[External] [slurm-users] scancel gpu jobs when gpu is not > requested > > *CAUTION:* This email originated from outside of the Colorado School of > Mines organization. Do not click on links or open attachments unless you > recognize the sender and know the content is safe. > > > > Hello, > > I have written a script in my prolog.sh that cancels any slurm job if the > parameter gres=gpu is not present. This is the script i added to my > prolog.sh > > if [ $SLURM_JOB_PARTITION == "gpu" ]; then > if [ ! -z "${GPU_DEVICE_ORDINAL}" ]; then > echo "GPU ID used is ID: $GPU_DEVICE_ORDINAL " > list_gpu=$(echo "$GPU_DEVICE_ORDINAL" | sed -e "s/,//g") > Ngpu=$(expr length $list_gpu) > else > echo "No GPU selected" > Ngpu=0 > fi > > > > # if 0 gpus were allocated, cancel the job > > if [ "$Ngpu" -eq "0" ]; then > scancel ${SLURM_JOB_ID} > fi > fi > > What the code does is look at the number of gpus allocated, and if it is > 0, cancel the job ID. It working fine if a user use sbatch submit.sh (and > the submit.sh do not have the value --gres=gpu:1). However, when requesting > an interactive session without gpus, the job is getting killed and the job > hangs for 5-6 mins before getting killed. > > jlo@mfe01:~ $ srun --partition=gpu --pty bash --login > > srun: job 4631872 queued and waiting for resources > > srun: job 4631872 has been allocated resources > > srun: Force Terminated job 4631872 ...the killing hangs for 5-6minutes > > Is there anything wrong with my script? Why only when scancel an > interactive session, I am seeing this hanging. I would like to remove the > hanging > > Thanks > > *Fritz Ratnasamy* > > Data Scientist > > Information Technology > > The University of Chicago > > Booth School of Business > > 5807 S. Woodlawn > > Chicago, Illinois 60637 > > Phone: +(1) 773-834-4556 > > CAUTION: This email has originated outside of University email systems. > Please do not click links or open attachments unless you recognize the > sender and trust the contents as safe. > > > >