Hi, After reading a little the FAQ on the methods used by Open MPI to deal with memory registration (or pinning) with Infiniband adapter, it seems that we could avoid all the overhead and complexity of memory registration/deregistration, registration cache access and update, memory management (ummunotify) in addition to allowing a better overlap of the communications with the computations (we could let the communication hardware do its job independently without resorting to registration/transfer/deregistration pipelines) by simply having all user process memory registered all the time.
Of course a configuration like that is not appropriate in a general setting (ex: a desktop environment) as it would make swapping almost impossible. But in the context of an HPC node where the processes are not supposed to swap and the OS not overcommit memory, not being able to swap doesn't appear to be a problem. Moreover since the maximal total memory used per process is often predefined at the application start as a resource specified to the queuing system, the OS could easily keep a defined amount of extra memory for its own need instead of swapping out user process memory. I guess that specialized (non-Linux) compute node OS does this. But is it possible and does it make sense with Linux ? Thanks, Martin Audet