Re: [OMPI users] EXTERNAL: Re: gathering problem

Ralph Castain Mon, 12 Nov 2012 18:07:35 -0500

Most common problem is connector not fully seated. Causes capacitive 
connection,  passes many tests


Try reseating connectors

Sent from my iPhone

On Nov 12, 2012, at 3:15 PM, "Hodge, Gary C" <gary.c.ho...@lmco.com> wrote:

> Today I tried a test where all sender and receiver processes were located on 
> nodes on the same blade chassis.
> With this process deployment, I did not detect any delayed send/receive 
> processing, all tests ran as expected!!!
>  
> In the failing test below, the sender (proc39) was on the second blade 
> chassis and the receiver (proc28) was on the first blade chassis.
>  
> So I must have some problem in the Infiniband fabric?
>  
> Each chassis has a switch to which all blades are internally connected.  The 
> two switches are connected by 4 external cables.
>  
> Are there any tests I can run to diagnose why there are intermittent 
> connection problems between the switches?
>  
>  
> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On 
> Behalf Of George Bosilca
> Sent: Friday, November 09, 2012 6:02 PM
> To: Open MPI Users
> Subject: EXTERNAL: Re: [OMPI users] gathering problem
>  
> Gary,
>  
> Looking at the trace it become clear that the culprit is the receiving 
> process and not the sending one.
>  
> Let me walk you over. I split the trace in several groups. The first one will 
> look at the sender (proc39) and the receiver (proc28) for the first message.
>  
> PROC39: 59.2268409810; SEND_REQ_ACTIVATE, count=81536, peer=28
> PROC39: 59.2268649811; SEND_REQ_XFER_BEGIN, count=81536, peer=28
> PROC39: 59.2270019811; SEND_REQ_XFER_CONTINUE, count=27178, peer=28
> PROC39: 59.2270509811; SEND_REQ_XFER_CONTINUE, count=54358, peer=28
> PROC39: 59.2270809811; SEND_REQ_XFER_END, count=81536, peer=28
> PROC39: 59.2270869811; SEND_REQ_COMPLETE, count=81536, peer=28
> PROC39: 59.2270939811; SEND_REQ_NOTIFY, count=81536, peer=28
>  
> PROC28: 59.2271299611; RECV_MSG_ARRIVED, count=0, peer=39
> PROC28: 59.2271389612; RECV_SEARCH_POSTED_Q_BEGIN, count=0, peer=39
> PROC28: 59.2271439611; RECV_REQ_REMOVE_FROM_POSTED_Q, count=172084, peer=39
> PROC28: 59.2271479612; RECV_MSG_MATCH_POSTED_REQ, count=172084, peer=39
> PROC28: 59.2271519612; RECV_SEARCH_POSTED_Q_END, count=0, peer=39
> PROC28: 59.2271559612; RECV_REQ_XFER_BEGIN, count=172084, peer=39
> PROC28: 59.2271609611; RECV_REQ_XFER_CONTINUE, count=27178, peer=39
> PROC28: 59.2271759611; RECV_REQ_XFER_CONTINUE, count=54358, peer=39
> PROC28: 59.4331819612; RECV_REQ_XFER_END, count=172084, peer=39
> PROC28: 59.4332009612; RECV_REQ_COMPLETE, count=172084, peer=39
>  
> You can see that the sender send the matching information (27178 bytes line 
> 2). This message arrives on the receiver at line 1. There seems to be a small 
> time drift between the two, so don't use the absolute time to compare. The 
> first piece of data (aka 27178 bytes) is then copied on the receiver memory 
> (line 7). As the pipeline protocol is on, the receiver ask for the second 
> piece of data right after (line 8). The receiver will detect the completion 
> of the last piece of data transfer at line 9, which is about 1.8 ms later.
>  
> On the sender side, don't take in account the completion time, it is 
> irrelevant, as it is only the local reception (and thus barely indicate the 
> cost of the local memcpy).
>  
> Now on to the second communication:
>  
> PROC39: 59.2283769811; SEND_REQ_ACTIVATE, count=81536, peer=28
> PROC39: 59.2283959811; SEND_REQ_XFER_BEGIN, count=81536, peer=28
> PROC39: 59.4331379811; SEND_REQ_XFER_CONTINUE, count=27178, peer=28
> PROC39: 59.4332049811; SEND_REQ_XFER_CONTINUE, count=54358, peer=28
> PROC39: 59.4332569811; SEND_REQ_XFER_END, count=81536, peer=28
> PROC39: 59.4332659811; SEND_REQ_COMPLETE, count=81536, peer=28
> PROC39: 59.4332769811; SEND_REQ_NOTIFY, count=81536, peer=28
>  
> PROC28: 59.4332059611; RECV_MSG_ARRIVED, count=0, peer=39
> PROC28: 59.4332109612; RECV_SEARCH_POSTED_Q_BEGIN, count=0, peer=39
> PROC28: 59.4332149612; RECV_MSG_INSERT_IN_UNEX_Q, count=0, peer=39
> PROC28: 59.4332199611; RECV_SEARCH_POSTED_Q_END, count=0, peer=39
> PROC28: 59.4332269612; RECV_REQ_NOTIFY, count=172084, peer=39
> PROC28: 59.4332539611; RECV_REQ_ACTIVATE, count=172084, peer=39
> PROC28: 59.4332589612; RECV_SEARCH_UNEX_Q_BEGIN, count=172084, peer=39
> PROC28: 59.4332629612; RECV_REQ_MATCH_UNEX, count=172084, peer=39
> PROC28: 59.4332669611; RECV_MSG_REMOVE_FROM_UNEX_Q, count=0, peer=39
> PROC28: 59.4332719612; RECV_SEARCH_UNEX_Q_END, count=172084, peer=39
> PROC28: 59.4332759612; RECV_REQ_XFER_BEGIN, count=172084, peer=39
> PROC28: 59.4332809611; RECV_REQ_XFER_CONTINUE, count=27178, peer=39
>  
> Here the things are getting interesting. You can see that the matching 
> information of the second send (line 1 on the receiving side) is received 
> before the first receive is reported back to the MPI layer (line 5). In other 
> words, the second receive is __unexpected__ (line 3 clearly highlight this). 
> Later on when the MPI irecv is posted, it is matched (line 8) and then the 
> confirmation is sent to the sender at line 11. Take a look at the time, this 
> explains why the second send look so slow.
>  
> If there is something that should be understood better is the lag on the 
> receiver for the detection of the completion of the first operation (as 
> highlighted by the following two lines):
>  
> PROC28: 59.2271759611; RECV_REQ_XFER_CONTINUE, count=54358, peer=39
> PROC28: 59.4331819612; RECV_REQ_XFER_END, count=172084, peer=39
>  
>   george.
>  
>  
> On Nov 9, 2012, at 12:36 , "Hodge, Gary C" <gary.c.ho...@lmco.com> wrote:
> 
> 
> Answering my own question, I have downloaded openMPI 1.6.2 and still get the 
> delay in the MPI_Send.  Previously, I was using openMPI 1.4.1
>  
> I configured 1.6.2 with –enable-peruse and have implemented PERUSE callbacks.
> Here is a trace of gb2 (PROC39) sending messages to ob (PROC28)
>  
> PROC39: 59.2268409810; SEND_REQ_ACTIVATE, count=81536, peer=28
> PROC39: 59.2268649811; SEND_REQ_XFER_BEGIN, count=81536, peer=28
> PROC39: 59.2270019811; SEND_REQ_XFER_CONTINUE, count=27178, peer=28
> PROC39: 59.2270509811; SEND_REQ_XFER_CONTINUE, count=54358, peer=28
> PROC39: 59.2270809811; SEND_REQ_XFER_END, count=81536, peer=28
> PROC39: 59.2270869811; SEND_REQ_COMPLETE, count=81536, peer=28
> PROC39: 59.2270939811; SEND_REQ_NOTIFY, count=81536, peer=28
> PROC28: 59.2271299611; RECV_MSG_ARRIVED, count=0, peer=39
> PROC28: 59.2271389612; RECV_SEARCH_POSTED_Q_BEGIN, count=0, peer=39
> PROC28: 59.2271439611; RECV_REQ_REMOVE_FROM_POSTED_Q, count=172084, peer=39
> PROC28: 59.2271479612; RECV_MSG_MATCH_POSTED_REQ, count=172084, peer=39
> PROC28: 59.2271519612; RECV_SEARCH_POSTED_Q_END, count=0, peer=39
> PROC28: 59.2271559612; RECV_REQ_XFER_BEGIN, count=172084, peer=39
> PROC28: 59.2271609611; RECV_REQ_XFER_CONTINUE, count=27178, peer=39
> PROC28: 59.2271759611; RECV_REQ_XFER_CONTINUE, count=54358, peer=39
> PROC39: 59.2283769811; SEND_REQ_ACTIVATE, count=81536, peer=28
> PROC39: 59.2283959811; SEND_REQ_XFER_BEGIN, count=81536, peer=28
> PROC39: 59.4331379811; SEND_REQ_XFER_CONTINUE, count=27178, peer=28
> PROC28: 59.4331819612; RECV_REQ_XFER_END, count=172084, peer=39
> PROC28: 59.4332009612; RECV_REQ_COMPLETE, count=172084, peer=39
> PROC39: 59.4332049811; SEND_REQ_XFER_CONTINUE, count=54358, peer=28
> PROC28: 59.4332059611; RECV_MSG_ARRIVED, count=0, peer=39
> PROC28: 59.4332109612; RECV_SEARCH_POSTED_Q_BEGIN, count=0, peer=39
> PROC28: 59.4332149612; RECV_MSG_INSERT_IN_UNEX_Q, count=0, peer=39
> PROC28: 59.4332199611; RECV_SEARCH_POSTED_Q_END, count=0, peer=39
> PROC28: 59.4332269612; RECV_REQ_NOTIFY, count=172084, peer=39
> PROC28: 59.4332539611; RECV_REQ_ACTIVATE, count=172084, peer=39
> PROC39: 59.4332569811; SEND_REQ_XFER_END, count=81536, peer=28
> PROC28: 59.4332589612; RECV_SEARCH_UNEX_Q_BEGIN, count=172084, peer=39
> PROC28: 59.4332629612; RECV_REQ_MATCH_UNEX, count=172084, peer=39
> PROC39: 59.4332659811; SEND_REQ_COMPLETE, count=81536, peer=28
> PROC28: 59.4332669611; RECV_MSG_REMOVE_FROM_UNEX_Q, count=0, peer=39
> PROC28: 59.4332719612; RECV_SEARCH_UNEX_Q_END, count=172084, peer=39
> PROC28: 59.4332759612; RECV_REQ_XFER_BEGIN, count=172084, peer=39
> PROC39: 59.4332769811; SEND_REQ_NOTIFY, count=81536, peer=28
> PROC28: 59.4332809611; RECV_REQ_XFER_CONTINUE, count=27178, peer=39
> PROC39: 59.4332849811; ERROR: component gb2 exceeded send time limit by 
> 0.104915 seconds.
> PROC39: 59.4332849811; WARNING: component gb2 increased page faults 
> R=130898,0; P=130898,0; S=132207,0
>  
> Notice the 205 ms delay between the hi-lighted SEND_REQ_XFER_BEGIN and the 
> subsequent SEND_REQ_XFER_CONTINUE
> Also notice that there was no such delay in the previous send request.
>  
> The last two lines are my own debug that monitor excessive time spent in an 
> MPI_Send and monitor increasing number of page faults.
> After the delaying MPI_Send, the page faults have increased by 1309 (132207 – 
> 130898) !!!
>  
> I looked at the PML code (pml_ob1_sendreq.c) and I suspect that something is 
> going awry in the mca_bml_base_prepare_src function.
> I believe that this is the function in which the openib BTL makes calls to 
> register the user memory.
>  
> This delay in the MPI_Send kills any hope of meeting our real-time 
> requirements, so any help is greatly appreciated.
>  
>  
> From: Hodge, Gary C 
> Sent: Monday, November 05, 2012 12:27 PM
> To: us...@open-mpi.org
> Subject: gathering problem
>  
> I continue to have a problem where 2 processes are sending to the same 
> process and one of the sending processes hangs for 150 to 550 ms in the call 
> to MPI_Send.
>  
> Each process runs on a different node and the receiving process has posted an 
> MPI_Irecv 17 ms before the hanging send.
> The posted receives are for 172K buffers and the sending processes are 
> sending 81K size messages.
> I have set mpi_leave_pinned to 1 and have increased the 
> btl_openib_receive_queues to …:S,65536,512,256,64
>  
> How do I trace the various phases of message passing to diagnose where the 
> send is hanging up?
>  
>  
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>  
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] EXTERNAL: Re: gathering problem

Reply via email to