Alright. I didn't see that option for GNU parallel. Retrying a task that failed for good reasons, makes maybe not much sense (e.g. due to OOM). And if the farming job timed out, on restart that job, GNU parallel does not start from the former state, does it? I guess book-keeping is an extra issue, why Magnus probably also used some server including some data base or so.
But ok. GNU parallel's documentation is indeed quite vast. I try to parse it's other/new features (it is also still developed on ... ). Concerning Dask ... I heard of it. But never tried ('cos Intel advertised it ... 😏 ). Maybe I should reconsider that. Thank you for this input! KR, Martin ________________________________ Von: slurm-users <slurm-users-boun...@lists.schedmd.com> im Auftrag von Ward Poelmans <ward.poelm...@vub.be> Gesendet: Mittwoch, 18. Januar 2023 15:35 An: slurm-users@lists.schedmd.com Betreff: Re: [slurm-users] srun jobfarming hassle question On 18/01/2023 15:22, Ohlerich, Martin wrote: > But Magnus (Thanks for the Link!) is right. This is still far away from a > feature rich job- or task-farming concept, where at least some overview of > the passed/failed/missing task statistics is available etc. GNU parallel has log output and options to retry failed jobs. If you want really fancy stuff, maybe look at dask combined with slurm plugins? It has dashboards for jupyter I believe. Ward