Regarding all of the other pending patches. I'll try go look to them soon...
Hi Pär, Thanks, I've applied these patches. You should probably also apply this patch: https://github.com/SchedMD/slurm/commit/06750780b024d07da2628d4bf501077dd5e49a8b.patch diff --git a/src/slurmd/slurmstepd/multi_prog.c b/src/slurmd/slurmstepd/multi_prog.c index ad30329..61e538d 100644 --- a/src/slurmd/slurmstepd/multi_prog.c +++ b/src/slurmd/slurmstepd/multi_prog.c @@ -141,7 +141,7 @@ _sub_expression(char *args_spec, int task_rank, int task_offset) */ extern int multi_prog_get_argv(char *file_contents, char **prog_env, int task_rank, - int *argc, char ***argv) + uint32_t *argc, char ***argv) { char *line = NULL; int line_num = 0; diff --git a/src/slurmd/slurmstepd/multi_prog.h b/src/slurmd/slurmstepd/multi_prog.h index aeb9a4d..da7a046 100644 --- a/src/slurmd/slurmstepd/multi_prog.h +++ b/src/slurmd/slurmstepd/multi_prog.h @@ -47,5 +47,5 @@ * "task_rank" is the task's GLOBAL rank within the job step. */ extern int multi_prog_get_argv(char *config_data, char **prog_env, - int task_rank, int *argc, char ***argv); + int task_rank, uint32_t *argc, char ***argv); #endif /* !_SLURMD_MULTI_PROG_H */ Quoting Pär Andersson <pa...@nsc.liu.se>: > > > Unsubscribe Automatically: > http://lists.schedmd.com/cgi-bin/dada/mail.cgi/u/slurmdev/jette/schedmd.com/ > Hi, > > I have found and fixed a few problems related to batch jobs with long > argument lists. > > One of our users wanted to process a bunch of files, and submitted a > batch script using something like: > > "sbatch script /path/*" > > Where /path contained around 14k files. Another couple of similar jobs > with around 5k arguments each were also submitted. > > This slowed our slurmctld down to a crawl. Most user commands timed > out, and a single slurmctld thread pegged the CPU at 100%. > > > Problem 1. slurmctld segfault > > While debugging the performance issue I discovered that really long > argument lists will cause slurmctld to segfault. To reproduce > submit something like this: > > (WARNING THIS WILL CRASH YOUR SLURMCTLD) > $ sbatch testjob.py $(seq 1 140000) > > _unpack_job_desc_msg() calls safe_unpackstr_array(&job_desc_ptr->argv, > &job_desc_ptr->argc, buffer), which fails because argc is longer than > MAX_PACK_ARRAY_LEN. To clean up slurm_free_job_desc_msg() is called, > which tries to free msg->argv[i] which have not been allocated. > > Problem 2. truncated argument list > > After some more debugging I discovered a nasty bug. The command that > gets accepted by sbatch does not match what later gets executed on a > compute node. > > testjob.py is a small python script that prints out number of > arguments, and the first and last one: > > $ ./testjob.py $(seq 1 100000) > argv len: 100001 > arg 1 and 100000: 1 100000 > > Submitting the same command as a batch script gives another result: > > $ sbatch -n1 testjob.py $(seq 1 100000) > Submitted batch job 10 > $ cat slurm-10.out > argv len: 34465 > arg 1 and 34464: 1 34464 > > 34465 is what you get when casting 100001 to uint16_t. The problem is > that in slurmctld.h and slurdmstepd_job.h argc is defined as uint16_t, > while it is uint32_t in the rest of the code. > > I believe changing this is safe, since all (un)packing of argc+argv > use (un)packstr_array, and that function uses uint32_t. > > Problem 3. slurmctld performance > > Finally, our initial problem which the slurmctld slowdown. > > The slowdown is in _pack_default_job_details(), which is typically > called by _slurm_rpc_dump_jobs() => pack_all_jobs() => pack_job() => > _pack_default_job_details(). > > So the slowdown is triggered by REQUEST_JOB_INFO RPCs, and as > _slurm_rpc_dump_jobs() holds the job read lock this will block many > other threads. > > I rewrote the code in _pack_default_job_details() so that instead of > doing two xstrcat() calls per argument it does one single > xmalloc(). On my laptop, the same test case as before with 100000 > arguments, went from 16 seconds to 7 milliseconds. About 2300 times > faster. > > Regards, > > Pär Andersson > NSC >