Hi Chris,

At this moment, CRIU can only checkpoint serial applications. It is an
ongoing project so this may change in the future, but I am pretty
confident that will remain like this in the short and middle terms.
However, we are also working with the developers of DMTCP
(http://dmtcp.sourceforge.net/) on the Slurm driver. It is almost
finished, and a beta version is already being tested :)  DMTCP can
checkpoint parallel applications (I have tried MVAPICH, not sure if
OpenMPI right now) and GPUs are in their roadmap too, so may be useful
for you.

Anyway, I'll do a presentation on all this on the oncoming Slurm User
Meeting, so in a few weeks a will hopefully produce a PDF with a full
comparison among all them in terms of performance, requirements and
integration with Slurm.

Cheers,


Manuel



2016-08-31 4:02 GMT+02:00 Christopher Samuel <sam...@unimelb.edu.au>:
>
> On 30/08/16 22:11, Manuel Rodríguez Pascual wrote:
>
>> We hope that this can be useful for the Slurm community.
>
> That's really pretty neat!
>
> I can't test myself as we're stuck on RHEL6 for the moment but I do
> wonder if you've considered doing the same for Open-MPI so that Slurm
> can do checkpoint/resume for it in the same way it does for BLCR at the
> moment?
>
> All the best,
> Chris
> --
>  Christopher Samuel        Senior Systems Administrator
>  VLSCI - Victorian Life Sciences Computation Initiative
>  Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
>  http://www.vlsci.org.au/      http://twitter.com/vlsci

Reply via email to