[slurm-dev] RE: [Non-DoD Source] Re: General Post-Processing Question (UNCLASSIFIED)

John Hearns Thu, 20 Jul 2017 01:03:33 -0700

Anthony, I back up what Peter says.  I had a project recently where we had
a render farm deployed with Slurm.
There were 'data mover' jobs needed which ran once a render was complete
and we used job dependencies for these.


I guess though from what you say that you wil have to monitor how long the
pos-processing jobs take to start.
Have you looked at the new job packs feature which was announced on the
list a few days ago?
https://slurm.schedmd.com/SLUG16/Job_Packs_SUG_2016.pdf


I also looked at doing the renderfarm tasks using an srun which reserved a
compute node, then within the srun script I did an sbatch for the render
then the data mover phase. That is of course not efficien tif you are
running on N compute nodes, as N-1 are left idle while the post-rpocessing
its taking place.







On 19 July 2017 at 19:08, Glover, Anthony E CTR USARMY RDECOM (US) <
anthony.e.glover....@mail.mil> wrote:

> CLASSIFICATION: UNCLASSIFIED
>
> Thanks Pete. That looks exactly like what I need. I couldn't think of the
> right search term, but pipeline is exactly what I'm trying to do.
>
> Thanks!
> Tony
>
> -----Original Message-----
> From: Peter A Ruprecht [mailto:peter.rupre...@colorado.edu]
> Sent: Wednesday, July 19, 2017 11:32 AM
> To: slurm-dev <slurm-dev@schedmd.com>
> Subject: [Non-DoD Source] [slurm-dev] Re: General Post-Processing Question
> (UNCLASSIFIED)
>
> Tony,
>
> Have you considered using Slurm job dependencies for this workflow?  That
> way you can submit the initial job and the post-processing job at the same
> time, but set a dependency on the post-processing job so that it can't
> start until the first job has finished successfully.  We've had users who
> manage fairly complicated analysis pipelines entirely with job dependencies.
>
> Regards,
> Pete
>
> On 7/19/17, 10:07 AM, "Glover, Anthony E CTR USARMY RDECOM (US)" <
> anthony.e.glover....@mail.mil> wrote:
>
>
>     CLASSIFICATION: UNCLASSIFIED
>
>     Got a general question, but one that might be specifically addressed
> by Slurm - don't know.
>
>     We have a multi-process, distributed simulation that runs as a single
> job and generates a significant amount of data. At the end of that run, we
> would like to be able to post-process the data. The post-processing
> currently consists of python scripts wrapped up in luigi workflows/tasks.
> We would like to be able to distribute those tasks across the cluster as
> well to speed up the post-processing.
>
>     So, my question is: what is the best way to trigger submitting a job
> to Slurm based upon the completion of a previous job? I see that the
> strigger command can probably do what I need, but maybe it is more of a
> workflow question that I have. If we have say 100 of these simulation jobs
> in the queue, then it would seem like I would want the post-processing to
> run at the end of each job, but if the trigger submits another job with
> multiple CPU needs, then that job would go in at the back of the queue. I
> guess I could set the priority such that it jumps the remaining simulation
> jobs, or maybe a separate post-processing queue is more appropriate.
> Anyway, just looking for some ideas as to how others might be addressing
> this type or problem. Any guidance would be much appreciated.
>
>     Thanks,
>     Tony
>
>     CLASSIFICATION: UNCLASSIFIED
>
>
> CLASSIFICATION: UNCLASSIFIED
>

[slurm-dev] RE: [Non-DoD Source] Re: General Post-Processing Question (UNCLASSIFIED)

Reply via email to