Regarding all of the other pending patches. I'll try go look to them soon...

Hi Pär, Thanks, I've applied these patches. You should probably also  
apply this patch:
https://github.com/SchedMD/slurm/commit/06750780b024d07da2628d4bf501077dd5e49a8b.patch

diff --git a/src/slurmd/slurmstepd/multi_prog.c  
b/src/slurmd/slurmstepd/multi_prog.c
index ad30329..61e538d 100644
--- a/src/slurmd/slurmstepd/multi_prog.c
+++ b/src/slurmd/slurmstepd/multi_prog.c
@@ -141,7 +141,7 @@ _sub_expression(char *args_spec, int task_rank,  
int task_offset)
   */
  extern int
  multi_prog_get_argv(char *file_contents, char **prog_env, int task_rank,
-                   int *argc, char ***argv)
+                   uint32_t *argc, char ***argv)
  {
        char *line = NULL;
        int line_num = 0;
diff --git a/src/slurmd/slurmstepd/multi_prog.h  
b/src/slurmd/slurmstepd/multi_prog.h
index aeb9a4d..da7a046 100644
--- a/src/slurmd/slurmstepd/multi_prog.h
+++ b/src/slurmd/slurmstepd/multi_prog.h
@@ -47,5 +47,5 @@
   * "task_rank" is the task's GLOBAL rank within the job step.
   */
  extern int multi_prog_get_argv(char *config_data, char **prog_env,
-                              int task_rank, int *argc, char ***argv);
+                              int task_rank, uint32_t *argc, char ***argv);
  #endif /* !_SLURMD_MULTI_PROG_H */



Quoting Pär Andersson <pa...@nsc.liu.se>:

>
>
> Unsubscribe Automatically:
> http://lists.schedmd.com/cgi-bin/dada/mail.cgi/u/slurmdev/jette/schedmd.com/
> Hi,
>
> I have found and fixed a few problems related to batch jobs with long
> argument lists.
>
> One of our users wanted to process a bunch of files, and submitted a
> batch script using something like:
>
>         "sbatch script /path/*"
>
> Where /path contained around 14k files. Another couple of similar jobs
> with around 5k arguments each were also submitted.
>
> This slowed our slurmctld down to a crawl. Most user commands timed
> out, and a single slurmctld thread pegged the CPU at 100%.
>
>
> Problem 1. slurmctld segfault
>
> While debugging the performance issue I discovered that really long
> argument lists will cause slurmctld to segfault. To reproduce
> submit something like this:
>
>    (WARNING THIS WILL CRASH YOUR SLURMCTLD)
>    $ sbatch testjob.py $(seq 1 140000)
>
> _unpack_job_desc_msg() calls safe_unpackstr_array(&job_desc_ptr->argv,
> &job_desc_ptr->argc, buffer), which fails because argc is longer than
> MAX_PACK_ARRAY_LEN. To clean up slurm_free_job_desc_msg() is called,
> which tries to free msg->argv[i] which have not been allocated.
>
> Problem 2. truncated argument list
>
> After some more debugging I discovered a nasty bug. The command that
> gets accepted by sbatch does not match what later gets executed on a
> compute node.
>
> testjob.py is a small python script that prints out number of
> arguments, and the first and last one:
>
>    $ ./testjob.py $(seq 1 100000)
>    argv len: 100001
>    arg 1 and 100000: 1 100000
>
> Submitting the same command as a batch script gives another result:
>
>    $ sbatch -n1 testjob.py $(seq 1 100000)
>    Submitted batch job 10
>    $ cat slurm-10.out
>    argv len: 34465
>    arg 1 and 34464: 1 34464
>
> 34465 is what you get when casting 100001 to uint16_t. The problem is
> that in slurmctld.h and slurdmstepd_job.h argc is defined as uint16_t,
> while it is uint32_t in the rest of the code.
>
> I believe changing this is safe, since all (un)packing of argc+argv
> use (un)packstr_array, and that function uses uint32_t.
>
> Problem 3. slurmctld performance
>
> Finally, our initial problem which the slurmctld slowdown.
>
> The slowdown is in _pack_default_job_details(), which is typically
> called by _slurm_rpc_dump_jobs() => pack_all_jobs() => pack_job() =>
> _pack_default_job_details().
>
> So the slowdown is triggered by REQUEST_JOB_INFO RPCs, and as
> _slurm_rpc_dump_jobs() holds the job read lock this will block many
> other threads.
>
> I rewrote the code in _pack_default_job_details() so that instead of
> doing two xstrcat() calls per argument it does one single
> xmalloc(). On my laptop, the same test case as before with 100000
> arguments, went from 16 seconds to 7 milliseconds. About 2300 times
> faster.
>
> Regards,
>
> Pär Andersson
> NSC
>

Reply via email to