Anthony, I back up what Peter says. I had a project recently where we had a render farm deployed with Slurm. There were 'data mover' jobs needed which ran once a render was complete and we used job dependencies for these.
I guess though from what you say that you wil have to monitor how long the pos-processing jobs take to start. Have you looked at the new job packs feature which was announced on the list a few days ago? https://slurm.schedmd.com/SLUG16/Job_Packs_SUG_2016.pdf I also looked at doing the renderfarm tasks using an srun which reserved a compute node, then within the srun script I did an sbatch for the render then the data mover phase. That is of course not efficien tif you are running on N compute nodes, as N-1 are left idle while the post-rpocessing its taking place. On 19 July 2017 at 19:08, Glover, Anthony E CTR USARMY RDECOM (US) < anthony.e.glover....@mail.mil> wrote: > CLASSIFICATION: UNCLASSIFIED > > Thanks Pete. That looks exactly like what I need. I couldn't think of the > right search term, but pipeline is exactly what I'm trying to do. > > Thanks! > Tony > > -----Original Message----- > From: Peter A Ruprecht [mailto:peter.rupre...@colorado.edu] > Sent: Wednesday, July 19, 2017 11:32 AM > To: slurm-dev <slurm-dev@schedmd.com> > Subject: [Non-DoD Source] [slurm-dev] Re: General Post-Processing Question > (UNCLASSIFIED) > > Tony, > > Have you considered using Slurm job dependencies for this workflow? That > way you can submit the initial job and the post-processing job at the same > time, but set a dependency on the post-processing job so that it can't > start until the first job has finished successfully. We've had users who > manage fairly complicated analysis pipelines entirely with job dependencies. > > Regards, > Pete > > On 7/19/17, 10:07 AM, "Glover, Anthony E CTR USARMY RDECOM (US)" < > anthony.e.glover....@mail.mil> wrote: > > > CLASSIFICATION: UNCLASSIFIED > > Got a general question, but one that might be specifically addressed > by Slurm - don't know. > > We have a multi-process, distributed simulation that runs as a single > job and generates a significant amount of data. At the end of that run, we > would like to be able to post-process the data. The post-processing > currently consists of python scripts wrapped up in luigi workflows/tasks. > We would like to be able to distribute those tasks across the cluster as > well to speed up the post-processing. > > So, my question is: what is the best way to trigger submitting a job > to Slurm based upon the completion of a previous job? I see that the > strigger command can probably do what I need, but maybe it is more of a > workflow question that I have. If we have say 100 of these simulation jobs > in the queue, then it would seem like I would want the post-processing to > run at the end of each job, but if the trigger submits another job with > multiple CPU needs, then that job would go in at the back of the queue. I > guess I could set the priority such that it jumps the remaining simulation > jobs, or maybe a separate post-processing queue is more appropriate. > Anyway, just looking for some ideas as to how others might be addressing > this type or problem. Any guidance would be much appreciated. > > Thanks, > Tony > > CLASSIFICATION: UNCLASSIFIED > > > CLASSIFICATION: UNCLASSIFIED >