I hope the following helps, but maybe I'm just repeating myself and Dick.
Let's say you're stuck in an MPI_Recv, MPI_Bcast, or MPI_Barrier call
waiting on someone else. You want to free up the CPU for more
productive purposes. There are basically two cases:
1) If you want to free the CPU up for the calling thread, the main
trick is returning program control to the caller. This requires a
non-blocking MPI call. There is such a thing for MPI_Recv (it's
MPI_Irecv, you know how to use it), but no such thing for MPI_Bcast or
MPI_Barrier. Anyhow, given a non-blocking call, you can return control
to the caller, who can do productive work while occasionally testing for
completion of the original operation.
2) If you want to free the CPU up for anyone else, what you want is
that the MPI implementation should not poll hard while it's waiting.
You can do that in Open MPI with the "mpi_yield_when_idle=1" variable.
E.g.,
% setenv OMPI_MCA_mpi_yield_when_idle 1
% mpirun a.out
or
% mpirun --mca mpi_yield_when_idle 1 a.out
I'm not sure about all systems, but I think yield might sometimes be
observable only if there is someone to yield to. It's like driving into
a traffic circle. You're supposed to yield to cars already in the
circle. This makes a difference only if there is someone in the
circle! Similarly, if you look at whether Open MPI is polling hard, you
might see that it is, indeed, polling hard even if you turn yield on.
The real test is to have another process compete for the same CPU. You
should see the MPI process and the competing process share the CPU in
the default case, but the competing process winning the CPU when yield
is turned on. I tried such a test on my system and confirmed that Open
MPI yield does "work".
I hope that helps.