Chunling Wang created HAWQ-559: ---------------------------------- Summary: QD hangs when QE is killed after connected to QD Key: HAWQ-559 URL: https://issues.apache.org/jira/browse/HAWQ-559 Project: Apache HAWQ Issue Type: Bug Components: Dispatcher Reporter: Chunling Wang Assignee: Lei Chang
When the first query finishes, the QE is still alive. Then we run the second query. After the thread of QD is created and bind to QE but not send data to QE, we kill this QE and find QD hangs. Here is the backtrace when QD hangs: * thread #1: tid = 0x1c4afd, 0x00007fff890355be libsystem_kernel.dylib`poll + 10, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP * frame #0: 0x00007fff890355be libsystem_kernel.dylib`poll + 10 frame #1: 0x000000010745692c postgres`receiveChunksUDP [inlined] udpSignalPoll + 42 at ic_udp.c:2882 frame #2: 0x0000000107456902 postgres`receiveChunksUDP + 26 at ic_udp.c:2715 frame #3: 0x00000001074568e8 postgres`receiveChunksUDP [inlined] waitOnCondition(timeout_us=250000) + 82 at ic_udp.c:1599 frame #4: 0x0000000107456896 postgres`receiveChunksUDP(pTransportStates=0x00007ff2a381ae48, pEntry=0x00007ff2a18f2230, motNodeID=<unavailable>, srcRoute=0x00007fff58c0ce96, conn=<unavailable>, inTeardown='\0') + 726 at ic_udp.c:4039 frame #5: 0x0000000107452a86 postgres`RecvTupleChunkFromAnyUDP [inlined] RecvTupleChunkFromAnyUDP_Internal + 498 at ic_udp.c:4146 frame #6: 0x0000000107452894 postgres`RecvTupleChunkFromAnyUDP(mlStates=<unavailable>, transportStates=<unavailable>, motNodeID=1, srcRoute=0x00007fff58c0ce96) + 100 at ic_udp.c:4167 frame #7: 0x0000000107442254 postgres`RecvTupleFrom [inlined] processIncomingChunks(mlStates=0x00007ff2a3812a30, transportStates=0x00007ff2a381ae48, motNodeID=1, srcRoute=<unavailable>) + 34 at cdbmotion.c:684 frame #8: 0x0000000107442232 postgres`RecvTupleFrom(mlStates=0x00007ff2a3812a30, transportStates=<unavailable>, motNodeID=1, tup_i=0x00007fff58c0cf00, srcRoute=-100) + 370 at cdbmotion.c:610 frame #9: 0x00000001071c8778 postgres`ExecMotion [inlined] execMotionUnsortedReceiver(node=<unavailable>) + 57 at nodeMotion.c:466 frame #10: 0x00000001071c873f postgres`ExecMotion(node=<unavailable>) + 1071 at nodeMotion.c:298 frame #11: 0x00000001071a4835 postgres`ExecProcNode(node=0x00007ff2a38164b8) + 613 at execProcnode.c:999 frame #12: 0x00000001071b9f82 postgres`ExecAgg + 104 at nodeAgg.c:1163 frame #13: 0x00000001071b9f1a postgres`ExecAgg + 316 at nodeAgg.c:1693 frame #14: 0x00000001071b9dde postgres`ExecAgg(node=0x00007ff2a3815348) + 126 at nodeAgg.c:1138 frame #15: 0x00000001071a4803 postgres`ExecProcNode(node=0x00007ff2a3815348) + 563 at execProcnode.c:979 frame #16: 0x000000010719ecfd postgres`ExecutePlan(estate=0x00007ff2a3814e30, planstate=0x00007ff2a3815348, operation=CMD_SELECT, numberTuples=0, direction=<unavailable>, dest=0x00007ff2a28db178) + 1181 at execMain.c:3218 frame #17: 0x000000010719e619 postgres`ExecutorRun(queryDesc=0x00007ff2a3811f00, direction=ForwardScanDirection, count=0) + 569 at execMain.c:1213 frame #18: 0x00000001072e7fc2 postgres`PortalRun + 14 at pquery.c:1649 frame #19: 0x00000001072e7fb4 postgres`PortalRun(portal=0x00007ff2a1893e30, count=<unavailable>, isTopLevel='\x01', dest=<unavailable>, altdest=0x00007ff2a28db178, completionTag=0x00007fff58c0d530) + 1124 at pquery.c:1471 frame #20: 0x00000001072e4a8e postgres`exec_simple_query(query_string=0x00007ff2a380fe30, seqServerHost=0x0000000000000000, seqServerPort=-1) + 2078 at postgres.c:1745 frame #21: 0x00000001072e0c4c postgres`PostgresMain(argc=<unavailable>, argv=<unavailable>, username=0x00007ff2a201bcf0) + 9404 at postgres.c:4754 frame #22: 0x000000010729a002 postgres`ServerLoop [inlined] BackendRun + 105 at postmaster.c:5889 frame #23: 0x0000000107299f99 postgres`ServerLoop at postmaster.c:5484 frame #24: 0x0000000107299f99 postgres`ServerLoop + 9593 at postmaster.c:2163 frame #25: 0x0000000107296f3b postgres`PostmasterMain(argc=<unavailable>, argv=<unavailable>) + 5019 at postmaster.c:1454 frame #26: 0x0000000107200ca9 postgres`main(argc=9, argv=0x00007ff2a141eef0) + 1433 at main.c:209 frame #27: 0x00007fff95e8c5c9 libdyld.dylib`start + 1 thread #2: tid = 0x1c4afe, 0x00007fff890355be libsystem_kernel.dylib`poll + 10 frame #0: 0x00007fff890355be libsystem_kernel.dylib`poll + 10 frame #1: 0x000000010744d8e3 postgres`rxThreadFunc(arg=<unavailable>) + 2163 at ic_udp.c:6251 frame #2: 0x00007fff95e822fc libsystem_pthread.dylib`_pthread_body + 131 frame #3: 0x00007fff95e82279 libsystem_pthread.dylib`_pthread_start + 176 frame #4: 0x00007fff95e804b1 libsystem_pthread.dylib`thread_start + 13 thread #3: tid = 0x1c4b02, 0x00007fff890343f6 libsystem_kernel.dylib`__select + 10 frame #0: 0x00007fff890343f6 libsystem_kernel.dylib`__select + 10 frame #1: 0x00000001074ec47e postgres`pg_usleep(microsec=<unavailable>) + 78 at pgsleep.c:43 frame #2: 0x0000000107400c26 postgres`generateResourceRefreshHeartBeat(arg=0x00007ff2a141ce90) + 166 at rmcomm_QD2RM.c:1519 frame #3: 0x00007fff95e822fc libsystem_pthread.dylib`_pthread_body + 131 frame #4: 0x00007fff95e82279 libsystem_pthread.dylib`_pthread_start + 176 frame #5: 0x00007fff95e804b1 libsystem_pthread.dylib`thread_start + 13 -- This message was sent by Atlassian JIRA (v6.3.4#6332)