I've got a mpi program on an 8-core box that runs in a master-slave mode. The slaves calculate something, pass data to the master, and then call MPI_Bcast waiting for the master to update and return some data via a MPI_Bcast originating on the master.
One of the things the master does while the slaves are waiting is to make heavy use of fftw3 FFT routines which can support multi-threading. However, for threading to make sense, the slaves on same physical machine have to give up their CPU usage, and this doesn't seem to be the case (top shows them running at close to 100%). Is there another MPI routine that polls for data and then gives up its time-slice? Any other suggestions? Thanks in advance. David