Hello, I am having trouble understanding why I am getting an error when running the program produced by the attached C file. In this file, there are three short functions: send(), bounce() and main(). send() and bounce() both use MPI_Send() and MPI_Recv(), but critically, neither one is called from main(), so as I understand it, neither function should ever be run. main() is just:
int main(int argc, char *argv[]) { > int rank; > > MPI_Init(&argc, &argv); > MPI_Comm_rank(MPI_COMM_WORLD, &rank); > > MPI_Finalize(); > } > Despite the fact that the program should never enter send() or bounce(), when I compile with mpicc mpi-issue.c -o mpi-issue > and run with mpirun -n 2 --verbose ./mpi-issue > I get the following: *** An error occurred in MPI_Send > *** on a NULL communicator > *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, > *** and potentially your MPI job) > [dhcp-visitor-enr-117-111.slac.stanford.edu:99119] Local abort before > MPI_INIT completed successfully; not able to aggregate error messages, and > not able to guarantee that all other processes were killed! > ------------------------------------------------------- > Primary job terminated normally, but 1 process returned > a non-zero exit code.. Per user-direction, the job has been aborted. > ------------------------------------------------------- > -------------------------------------------------------------------------- > mpirun detected that one or more processes exited with non-zero status, > thus causing > the job to be terminated. The first process to do so was: > > Process name: [[2829,1],0] > Exit code: 1 > -------------------------------------------------------------------------- > > How is it possible to be getting an error in MPI_Send(), if MPI_Send() never gets run? I am running open-mpi 1.10.2, installed via the Homebrew open-mpi package, and this is running on my Macbook, which is running OSX Yosemite. I have attached the results of ompi_info --all as ompi_info.txt.bz2 Any help would be appreciated! Sorry if this is a newb question and I am missing something obvious--I tried my best to search for this issue but I couldn't find anything. -Devon
#include "mpi.h" #include <stdio.h> #include <stdlib.h> void send(int msg_len, int partner) { // initialize buffer unsigned char *buffer = (unsigned char *)malloc(msg_len * sizeof(unsigned char)); for(int i = 0; i < msg_len; ++i) { buffer[i] = i % 256; } // time bounce double start_t = MPI_Wtime(); MPI_Send(buffer, msg_len, MPI_UNSIGNED_CHAR, partner, msg_len, MPI_COMM_WORLD); MPI_Recv(buffer, msg_len, MPI_UNSIGNED_CHAR, partner, msg_len, MPI_COMM_WORLD, MPI_STATUS_IGNORE); double end_t = MPI_Wtime(); free(buffer); double time_taken = end_t - start_t; printf("%d: %e\n", msg_len, time_taken); } void bounce(int msg_len, int partner) { unsigned char *buffer = (unsigned char *) malloc(msg_len); MPI_Recv(buffer, msg_len, MPI_UNSIGNED_CHAR, partner, msg_len, MPI_COMM_WORLD, MPI_STATUS_IGNORE); MPI_Send(buffer, msg_len, MPI_UNSIGNED_CHAR, partner, msg_len, MPI_COMM_WORLD); free(buffer); } int main(int argc, char *argv[]) { int rank; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Finalize(); }
ompi_info.txt.bz2
Description: BZip2 compressed data