Maybe this question is not entirely appropriate to this forum, but maybe
someone reading this forum has already tried this and knows which method
is faster.
I am about to hook up a NAS node to my Centos based Linux cluster. NAS
storage will be shared amongst the nodes using GFS2. My OpenMPI program
needs to synchronize temporary files between the nodes, which is one of
reasons I am switching to NAS.
My program checkpoints/restarts by copying checkpoint/restart files to a
local directory. The current code for taking a checkpoint looks
something like:
if (mpi_id == mpi_host_id)
{
//save global variales and temporary files to the local CR directory
}
When switching to NAS I can leave the code as is assuming GFS2 is smart
enough to detect that the temporary files and the CR directory reside on
the NAS node and does not copy the files unncessary across the network:
if (mpi_id == mpi_host_id)
{
//save global variales and temporary files (now residing on the GFS2
server node) to a CR directory (also residing on the GFS2 server node)
}
or I can change the code to:
if (mpi_id == mpi_host_id)
{
//save global variables to the CR directory residing on the GFS2
server node
//send an OpenMPI message to the GFS2 server node to copy the
temporary files to a CR directory on the GFS2 server node
}
Is the second method a lot faster than the first method or is it about
the same?
Regards,
Gijsbert