Hello... I'm executing a long-running Drill (1.4) query (4-10mins) called via JDBC from Talend and sometimes I'm seeing an error stack like this (see below)
The query is a select statement with an order by against a directory of Parquet files which were produced by Spark. Probably half the time it succeeds and returns the expected results, but often it's erroring out as below. Can you help with any insights? Thanks in advance. ---Paul ... 2016-02-08 16:47:47,275 [2946cbe3-e73d-2ed4-da60-76c1bd799372:frag:1:0] INFO o.a.d.e.w.fragment.FragmentExecutor - 2946cbe3-e73d-2ed4-da60-76c1bd799372:1:0: State change requested RUNNING --> FINISHED 2016-02-08 16:47:47,276 [2946cbe3-e73d-2ed4-da60-76c1bd799372:frag:1:0] INFO o.a.d.e.w.f.FragmentStatusReporter - 2946cbe3-e73d-2ed4-da60-76c1bd799372:1:0: State to report: FINISHED 2016-02-08 16:48:25,496 [UserServer-1] INFO o.a.d.e.w.fragment.FragmentExecutor - 2946cbe3-e73d-2ed4-da60-76c1bd799372:0:0: State change requested RUNNING --> FAILED 2016-02-08 16:48:25,778 [2946cbe3-e73d-2ed4-da60-76c1bd799372:frag:0:0] INFO o.a.d.e.w.fragment.FragmentExecutor - 2946cbe3-e73d-2ed4-da60-76c1bd799372:0:0: State change requested FAILED --> FAILED 2016-02-08 16:48:25,779 [UserServer-1] INFO o.a.d.e.w.fragment.FragmentExecutor - 2946cbe3-e73d-2ed4-da60-76c1bd799372:0:0: State change requested FAILED --> FAILED 2016-02-08 16:48:25,779 [CONTROL-rpc-event-queue] INFO o.a.d.e.w.fragment.FragmentExecutor - 2946cbe3-e73d-2ed4-da60-76c1bd799372:0:0: State change requested FAILED --> CANCELLATION_REQUESTED 2016-02-08 16:48:25,779 [CONTROL-rpc-event-queue] WARN o.a.d.e.w.fragment.FragmentExecutor - 2946cbe3-e73d-2ed4-da60-76c1bd799372:0:0: Ignoring unexpected state transition FAILED --> CANCELLATION_REQUESTED 2016-02-08 16:48:25,779 [2946cbe3-e73d-2ed4-da60-76c1bd799372:frag:0:0] INFO o.a.d.e.w.fragment.FragmentExecutor - 2946cbe3-e73d-2ed4-da60-76c1bd799372:0:0: State change requested FAILED --> FAILED 2016-02-08 16:48:25,780 [2946cbe3-e73d-2ed4-da60-76c1bd799372:frag:0:0] INFO o.a.d.e.w.fragment.FragmentExecutor - 2946cbe3-e73d-2ed4-da60-76c1bd799372:0:0: State change requested FAILED --> FINISHED 2016-02-08 16:48:25,781 [UserServer-1] WARN o.a.d.exec.rpc.RpcExceptionHandler - Exception occurred with closed channel. Connection: /172.20.20.154:31010 <--> /172.20.20.157:64101 (user client) java.nio.channels.ClosedChannelException: null 2016-02-08 16:48:25,783 [2946cbe3-e73d-2ed4-da60-76c1bd799372:frag:0:0] ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: ChannelClosedException: Channel closed /172.20.20.154:31010 <--> /172.20.20.157:64101. Fragment 0:0 [Error Id: 2f075631-fb49-4feb-b39d-cbe89083a2ee on chai.dev.streetlightdata.com:31010] org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: ChannelClosedException: Channel closed /172.20.20.154:31010 <--> /172.20.20.157:64101. Fragment 0:0 [Error Id: 2f075631-fb49-4feb-b39d-cbe89083a2ee on chai.dev.streetlightdata.com:31010] at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534) ~[drill-common-1.4.0.jar:1.4.0] at org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:321) [drill-java-exec-1.4.0.jar:1.4.0] at org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:184) [drill-java-exec-1.4.0.jar:1.4.0] at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:290) [drill-java-exec-1.4.0.jar:1.4.0] at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) [drill-common-1.4.0.jar:1.4.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_66] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_66] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_66] Caused by: org.apache.drill.exec.rpc.ChannelClosedException: Channel closed /172.20.20.154:31010 <--> /172.20.20.157:64101. at org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplete(RpcBus.java:175) ~[drill-rpc-1.4.0.jar:1.4.0] at org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplete(RpcBus.java:151) ~[drill-rpc-1.4.0.jar:1.4.0] at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680) ~[netty-common-4.0.27.Final.jar:4.0.27.Final] at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:603) ~[netty-common-4.0.27.Final.jar:4.0.27.Final] at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:563) ~[netty-common-4.0.27.Final.jar:4.0.27.Final] at io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:406) ~[netty-common-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:82) ~[netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.AbstractChannel$CloseFuture.setClosed(AbstractChannel.java:943) ~[netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.AbstractChannel$AbstractUnsafe.doClose0(AbstractChannel.java:592) ~[netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:584) ~[netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.closeOnRead(AbstractEpollStreamChannel.java:409) ~[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:647) ~[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollRdHupReady(AbstractEpollStreamChannel.java:573) ~[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:315) ~[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:250) ~[netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) ~[netty-common-4.0.27.Final.jar:4.0.27.Final] ... 1 common frames omitted 2016-02-08 16:48:25,785 [drill-executor-42] WARN o.a.d.exec.rpc.control.WorkEventBus - Fragment 2946cbe3-e73d-2ed4-da60-76c1bd799372:0:0 not found in the work bus. 2016-02-08 16:48:25,810 [CONTROL-rpc-event-queue] WARN o.a.drill.exec.work.foreman.Foreman - Dropping request to move to COMPLETED state as query is already at CANCELED state (which is terminal). 2016-02-08 16:48:25,811 [UserServer-1] INFO o.a.drill.exec.work.foreman.Foreman - Failure while trying communicate query result to initiating client. This would happen if a client is disconnected before response notice can be sent. org.apache.drill.exec.rpc.ChannelClosedException: null at org.apache.drill.exec.rpc.CoordinationQueue$RpcListener.operationComplete(CoordinationQueue.java:89) [drill-rpc-1.4.0.jar:1.4.0] at org.apache.drill.exec.rpc.CoordinationQueue$RpcListener.operationComplete(CoordinationQueue.java:67) [drill-rpc-1.4.0.jar:1.4.0] at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680) [netty-common-4.0.27.Final.jar:4.0.27.Final] at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:603) [netty-common-4.0.27.Final.jar:4.0.27.Final] at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:563) [netty-common-4.0.27.Final.jar:4.0.27.Final] at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424) [netty-common-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:788) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:689) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1114) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:705) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:980) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:1032) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:965) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) [netty-common-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:254) [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) [netty-common-4.0.27.Final.jar:4.0.27.Final] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_66] 2016-02-08 16:48:25,812 [UserServer-1] WARN o.a.drill.exec.work.foreman.Foreman - Dropping request to move to FAILED state as query is already at CANCELED state (which is terminal).