Yes it makes sense not to introduce new options.

I am not familiar with cgroups, so I need to read up on it.

On the subject of OpenMPI and OGE - does OGE correctly suspend and resumes 
programs compiled with OpenMPI using the OpenMPI s/r implementation?

Joseph

On 6/11/2012 9:21 PM, Ron Chen wrote:
We have not implemented a flag for it, and it is not hard to add one. One thing 
about adding a new option is, we will then need to support it even if it turns 
out to be not needed, and we are careful not to add too much extra code, and 
that's why I will do more research first and decide if it is really needed.

I Google searched for TCP suspend issues, and found that some developers say 
that it is safe if the processes are suspended when they are at a quiescent 
point.

So if in-flight messages are processed first before suspending, which should be 
the case for the freezer cgroup subsystem, then it should be safe to handle it 
without adding a new flag.

See: http://www.kernel.org/doc/Documentation/cgroups/freezer-subsystem.txt

(And Rayson added cgroups support in GE 2011.11 U1, while cgroups is Linux 
only, Linux is run by most of the clusters, at least doing small to 
medium-scale HPC.)

IBM also planned to use Containers/Cgroups in IBM BlueWaters (before IBM 
cancelled the project in 2011) to perform checkpointing and restart.

https://events.linuxfoundation.org/slides/2011/lfcs/lfcs2011_hpc_smith.pdf

  -Ron


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to