Thanks for the search unfortunately no result from there seemed to fit my problem. I think what comes closest to my issue is the following thread, which I found on my initial search: https://lists.apache.org/list.html?user@drill.apache.org:gte=1d:user%20client%20closed%20unexpectedly. I'm trying to adjust resources now but I hardly doubt that this is the problem as I'm running the query on quite a big machine 64 cores ~500GB RAM.
Here is the stacktrace from the drillbit log: 2017-09-20 14:54:23,860 [263d7f11-78ce-df33-344e-d9a615530c26:frag:1:0] INFO o.a.d.e.w.f.FragmentStatusReporter - 263d7f11-78ce-df33-344e-d9a615530c26:1:0: State to report: FINISHED 2017-09-20 14:56:24,741 [UserServer-1] INFO o.a.drill.exec.rpc.user.UserServer - RPC connection /172.19.0.6:31010 <--> / 172.19.0.3:52382 (user client) timed out. Timeout was set to 30 seconds. Closing connection. 2017-09-20 14:56:24,755 [UserServer-1] INFO o.a.d.e.w.fragment.FragmentExecutor - 263d7f11-78ce-df33-344e-d9a615530c26:0:0: State change requested RUNNING --> FAILED 2017-09-20 14:56:24,763 [UserServer-1] WARN o.apache.drill.exec.rpc.RequestIdMap - Failure while attempting to fail rpc response. java.lang.IllegalArgumentException: Self-suppression not permitted at java.lang.Throwable.addSuppressed(Throwable.java:1043) ~[na:1.8.0_131] at org.apache.drill.common.DeferredException.addException(DeferredException.java:88) ~[drill-common-1.9.0.jar:1.9.0] at org.apache.drill.common.DeferredException.addThrowable(DeferredException.java:97) ~[drill-common-1.9.0.jar:1.9.0] at org.apache.drill.exec.work.fragment.FragmentExecutor.fail(FragmentExecutor.java:407) ~[drill-java-exec-1.9.0.jar:1.9.0] at org.apache.drill.exec.work.fragment.FragmentExecutor.access$700(FragmentExecutor.java:55) ~[drill-java-exec-1.9.0.jar:1.9.0] at org.apache.drill.exec.work.fragment.FragmentExecutor$ExecutorStateImpl.fail(FragmentExecutor.java:421) ~[drill-java-exec-1.9.0.jar:1.9.0] at org.apache.drill.exec.ops.FragmentContext.fail(FragmentContext.java:208) ~[drill-java-exec-1.9.0.jar:1.9.0] at org.apache.drill.exec.ops.FragmentContext$1.accept(FragmentContext.java:95) ~[drill-java-exec-1.9.0.jar:1.9.0] at org.apache.drill.exec.ops.FragmentContext$1.accept(FragmentContext.java:92) ~[drill-java-exec-1.9.0.jar:1.9.0] at org.apache.drill.exec.ops.StatusHandler.failed(StatusHandler.java:42) ~[drill-java-exec-1.9.0.jar:1.9.0] at org.apache.drill.exec.rpc.RequestIdMap$RpcListener.setException(RequestIdMap.java:134) ~[drill-rpc-1.9.0.jar:1.9.0] at org.apache.drill.exec.rpc.RequestIdMap$SetExceptionProcedure.apply(RequestIdMap.java:74) [drill-rpc-1.9.0.jar:1.9.0] at org.apache.drill.exec.rpc.RequestIdMap$SetExceptionProcedure.apply(RequestIdMap.java:64) [drill-rpc-1.9.0.jar:1.9.0] at com.carrotsearch.hppc.IntObjectHashMap.forEach(IntObjectHashMap.java:692) [hppc-0.7.1.jar:na] at org.apache.drill.exec.rpc.RequestIdMap.channelClosed(RequestIdMap.java:58) [drill-rpc-1.9.0.jar:1.9.0] at org.apache.drill.exec.rpc.RemoteConnection.channelClosed(RemoteConnection.java:175) [drill-rpc-1.9.0.jar:1.9.0] at org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplete(RpcBus.java:167) [drill-rpc-1.9.0.jar:1.9.0] at org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplete(RpcBus.java:146) [drill-rpc-1.9.0.jar:1.9.0] at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680) [netty-common-4.0.27.Final.jar:4.0.27.Final] at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:603) [netty-common-4.0.27.Final.jar:4.0.27.Final] at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:563) [netty-common-4.0.27.Final.jar:4.0.27.Final] at io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:406) [netty-common-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:82) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.AbstractChannel$CloseFuture.setClosed(AbstractChannel.java:943) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.AbstractChannel$AbstractUnsafe.doClose0(AbstractChannel.java:592) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:584) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.DefaultChannelPipeline$HeadContext.close(DefaultChannelPipeline.java:1099) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:615) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:600) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.ChannelOutboundHandlerAdapter.close(ChannelOutboundHandlerAdapter.java:71) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:615) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:600) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:466) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.handler.timeout.ReadTimeoutHandler.readTimedOut(ReadTimeoutHandler.java:187) [netty-handler-4.0.27.Final.jar:4.0.27.Final] at org.apache.drill.exec.rpc.BasicServer$LogggingReadTimeoutHandler.readTimedOut(BasicServer.java:121) [drill-rpc-1.9.0.jar:1.9.0] at io.netty.handler.timeout.ReadTimeoutHandler$ReadTimeoutTask.run(ReadTimeoutHandler.java:212) [netty-handler-4.0.27.Final.jar:4.0.27.Final] at io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38) [netty-common-4.0.27.Final.jar:4.0.27.Final] at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:120) [netty-common-4.0.27.Final.jar:4.0.27.Final] at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) [netty-common-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) [netty-common-4.0.27.Final.jar:4.0.27.Final] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131] Caused by: org.apache.drill.exec.rpc.ChannelClosedException: Channel closed /172.19.0.6:31010 <--> /172.19.0.3:52382. at org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplete(RpcBus.java:166) [drill-rpc-1.9.0.jar:1.9.0] ... 25 common frames omitted On Wed, 20 Sep 2017 at 22:18 Kunal Khatua <kkha...@mapr.com> wrote: > The client error reported is usually trimmed. What is the stack trace in > the drillbit logs? That will tell you exactly where the timeout occurred > and make it easier for you to work with. > > I did a quick browse through for 'S3 Timeout' ( > https://lists.apache.org/list.html?user@drill.apache.org:gte=1d:%20S3%20timeout > ). You could browse through to see if any suggestions here can help unblock > you. > > > > -----Original Message----- > From: Alan Höng [mailto:alan.f.ho...@gmail.com] > Sent: Wednesday, September 20, 2017 12:22 PM > To: user@drill.apache.org > Subject: Re: User client timeout with results > 2M rows > > Yes it takes about 2-3min for the timeout to appear the query itself > should finish in that time. The files are not that big for debugging. I > have, but I couldn't find anything relevant or helpful in my situation so > far. > > > On Wed, 20 Sep 2017 at 20:41 Kunal Khatua <kkha...@mapr.com> wrote: > > > Do you know in how much time does this timeout occur? There might be > > some tuning needed to increase a timeout. Also, I think this (S3 > > specifically) has been seen before... So you might find a solution > > within the mailing list archives. Did you try looking there? > > > > > > > > From: Alan Höng > > Sent: Wednesday, September 20, 8:46 AM > > Subject: User client timeout with results > 2M rows > > To: user@drill.apache.org > > > > > > Hello, > > > > I'm getting errors when trying to fetch results from drill with > > queries that evaluate to bigger tables. Surprisingly it works like a > > charm if the returned table has less than 2M rows. It also seems like > > the query is executed and finishes successfully.... > > > > I'm querying parquet files with GZIP compression on S3. I'm running > > drill in distributed mode with zookeeper. I use version 1.9 from the > > container available on dockerhub "harisekhon/apache-drill:1.9". I'm > > using the pydrill package which uses the rest api to submit queries and > gather results. > > > > I get the following error message from the client: > > > > TransportError(500, '{\n "errorMessage" : "CONNECTION ERROR: > > Connection / 172.19.0.3:52382<http://172.19.0.3:52382> <--> > > ef53daab0ef8/ 172.19.0.6:31010<http://172.19.0.6:31010> (user client) > > closed unexpectedly. Drillbit down?\\n\\n\\n[Error Id: > > 6a19835b-2325-431e-9bad-dde8f1d3c192 ]"\n}' > > > > I would appreciate any help with this. > > > > Best > > Alan Höng > > > > > > >