Consider matrices A: s x r and B: s x t. In the attached file, I am doing matrix multiplication in a distributed manner with one master node and N workers in order to compute C = A^T*B based on some algorithm.
For small matrices like if A and B are 10-by-10, I get the correct results without any error. Now, if I try A and B to be 1000-by-1000 the result is correct but I am getting the following error at the end of the execution: *[kostas-VirtualBox:02688] Read -1, expected 4000000, errno = 14* *[kostas-VirtualBox:02688] *** Process received signal **** *[kostas-VirtualBox:02688] Signal: Segmentation fault (11)* *[kostas-VirtualBox:02688] Signal code: Address not mapped (1)* *[kostas-VirtualBox:02688] Failing at address: 0x5096ea0* *[kostas-VirtualBox:02688] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f232f8de390]* *[kostas-VirtualBox:02688] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x14e156)[0x7f232f651156]* *[kostas-VirtualBox:02688] [ 2] /usr/local/lib/libopen-pal.so.20(opal_convertor_unpack+0x188)[0x7f232d4fa7d6]* *[kostas-VirtualBox:02688] [ 3] /usr/local/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_recv_request_progress_frag+0x230)[0x7f23237de373]* *[kostas-VirtualBox:02688] [ 4] /usr/local/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_recv_frag_callback_frag+0x6f)[0x7f23237da235]* *[kostas-VirtualBox:02688] [ 5] /usr/local/lib/openmpi/mca_btl_vader.so(mca_btl_vader_poll_handle_frag+0x1d7)[0x7f2328327dfb]* *[kostas-VirtualBox:02688] [ 6] /usr/local/lib/openmpi/mca_btl_vader.so(+0x6f17)[0x7f2328327f17]* *[kostas-VirtualBox:02688] [ 7] /usr/local/lib/openmpi/mca_btl_vader.so(+0x70ea)[0x7f23283280ea]* *[kostas-VirtualBox:02688] [ 8] /usr/local/lib/libopen-pal.so.20(opal_progress+0xa9)[0x7f232d4e197d]* *[kostas-VirtualBox:02688] [ 9] /usr/local/lib/libmpi.so.20(ompi_mpi_finalize+0x359)[0x7f232db1d31e]* *[kostas-VirtualBox:02688] [10] /usr/local/lib/libmpi.so.20(PMPI_Finalize+0x59)[0x7f232db49cdf]* *[kostas-VirtualBox:02688] [11] /home/kostas/.local/lib/python2.7/site-packages/mpi4py/MPI.so(+0x2ed6c)[0x7f232de7bd6c]* *[kostas-VirtualBox:02688] [12] python2[0x4354d8]* *[kostas-VirtualBox:02688] [13] python2(Py_Main+0x43c)[0x497acc]* *[kostas-VirtualBox:02688] [14] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f232f523830]* *[kostas-VirtualBox:02688] [15] python2(_start+0x29)[0x4975a9]* *[kostas-VirtualBox:02688] *** End of error message **** *[warn] Epoll ADD(4) on fd 38 failed. Old events were 0; read change was 0 (none); write change was 1 (add): Bad file descriptor* *--------------------------------------------------------------------------* *mpirun noticed that process rank 0 with PID 0 on node kostas-VirtualBox exited on signal 11 (Segmentation fault).* *--------------------------------------------------------------------------* If you want to reproduce the problem please do so by keeping the same parameters I used in the code since there are some constraints on them based on the algorithm. Also please use 6 MPI processes (rank 0 is going to be the master and the rest N=5 will be the workers). Also keep the dimensions r,s,t of the matrices to be s == r == t and all of them to be even. I am using MPI4py 3.0.0. along with Python 2.7.14, Numpy 1.14.3 and the kernel of Open MPI 2.1.2. I cannot understand how I get a bad file descriptor if I am not writing to some file.
polycode_fast.py
Description: Binary data
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users