Have you tried disabling hash joins or hash agg on the query or changing
the planning width? Here are some docs to check out:

https://drill.apache.org/docs/configuring-resources-for-a-shared-drillbit/

https://drill.apache.org/docs/guidelines-for-optimizing-aggregation/

https://drill.apache.org/docs/sort-based-and-hash-based-memory-constrained-operators/

Let us know if any of these have an effect on the queries...

Also, the three links I posted here are query based changes, so an ALTER
SESSION should address them. On the suggestion above with memory, that
WOULD have to be made on all Drill bits running, and would require a
restart of the Drillbit to take effect.



On Sat, Mar 4, 2017 at 1:01 PM, Anup Tiwari <anup.tiw...@games24x7.com>
wrote:

> Hi John,
>
> I have tried above config as well but still getting this issue.
> And please note that we were using similar configuration params for Drill
> 1.6 where this issue was not coming.
> Anything else which i can try?
>
> Regards,
> *Anup Tiwari*
>
> On Fri, Mar 3, 2017 at 11:01 PM, Abhishek Girish <agir...@apache.org>
> wrote:
>
> > +1 on John's suggestion.
> >
> > On Fri, Mar 3, 2017 at 6:24 AM, John Omernik <j...@omernik.com> wrote:
> >
> > > So your node has 32G of ram yet you are allowing Drill to use 36G.  I
> > would
> > > change your settings to be 8GB of Heap, and 22GB of Direct Memory. See
> if
> > > this helps with your issues.  Also, are you using a distributed
> > filesystem?
> > > If so you may want to allow even more free ram...i.e. 8GB of Heap and
> > 20GB
> > > of Direct.
> > >
> > > On Fri, Mar 3, 2017 at 8:20 AM, Anup Tiwari <anup.tiw...@games24x7.com
> >
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > Please find our configuration details :-
> > > >
> > > > Number of Nodes : 4
> > > > RAM/Node : 32GB
> > > > Core/Node : 8
> > > > DRILL_MAX_DIRECT_MEMORY="20G"
> > > > DRILL_HEAP="16G"
> > > >
> > > > And all other variables are set to default.
> > > >
> > > > Since we have tried some of the settings suggested above but still
> > facing
> > > > this issue more frequently, kindly suggest us what is best
> > configuration
> > > > for our environment.
> > > >
> > > > Regards,
> > > > *Anup Tiwari*
> > > >
> > > > On Thu, Mar 2, 2017 at 1:26 AM, John Omernik <j...@omernik.com>
> wrote:
> > > >
> > > > > Another thing to consider is ensure you have a Spill Location
> setup,
> > > and
> > > > > then disable hashagg/hashjoin for the query...
> > > > >
> > > > > On Wed, Mar 1, 2017 at 1:25 PM, Abhishek Girish <
> agir...@apache.org>
> > > > > wrote:
> > > > >
> > > > > > Hey Anup,
> > > > > >
> > > > > > This is indeed an issue, and I can understand that having an
> > unstable
> > > > > > environment is not something anyone wants. DRILL-4708 is still
> > > > > unresolved -
> > > > > > hopefully someone will get to it soon. I've bumped up the
> priority.
> > > > > >
> > > > > > Unfortunately we do not publish any sizing guidelines, so you'd
> > have
> > > to
> > > > > > experiment to settle on the right load for your cluster. Please
> > > > decrease
> > > > > > the concurrency (number of queries running in parallel). And try
> > > > bumping
> > > > > up
> > > > > > Drill DIRECT memory. Also, please set the system options
> > recommended
> > > by
> > > > > > Sudheesh. While this may not solve the issue, it may help reduce
> > it's
> > > > > > occurrence.
> > > > > >
> > > > > > Can you also update the JIRA with your configurations, type of
> > > queries
> > > > > and
> > > > > > the relevant logs?
> > > > > >
> > > > > > -Abhishek
> > > > > >
> > > > > > On Wed, Mar 1, 2017 at 10:17 AM, Anup Tiwari <
> > > > anup.tiw...@games24x7.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > Can someone look into it? As we are now getting this more
> > > frequently
> > > > in
> > > > > > > Adhoc queries as well.
> > > > > > > And for automation jobs, we are moving to Hive as in drill we
> are
> > > > > getting
> > > > > > > this more frequently.
> > > > > > >
> > > > > > > Regards,
> > > > > > > *Anup Tiwari*
> > > > > > >
> > > > > > > On Sat, Dec 31, 2016 at 12:11 PM, Anup Tiwari <
> > > > > anup.tiw...@games24x7.com
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > We are getting this issue bit more frequently. can someone
> > please
> > > > > look
> > > > > > > > into it and tell us that why it is happening since as mention
> > in
> > > > > > earlier
> > > > > > > > mail when this query gets executed no other query is running
> at
> > > > that
> > > > > > > time.
> > > > > > > >
> > > > > > > > Thanks in advance.
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > > *Anup Tiwari*
> > > > > > > >
> > > > > > > > On Sat, Dec 24, 2016 at 10:20 AM, Anup Tiwari <
> > > > > > anup.tiw...@games24x7.com
> > > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > >> Hi Sudheesh,
> > > > > > > >>
> > > > > > > >> Please find below ans :-
> > > > > > > >>
> > > > > > > >> 1. Total 4,(3 Datanodes, 1 namenode)
> > > > > > > >> 2. Only one query, as this query is part of daily dump and
> > runs
> > > in
> > > > > > early
> > > > > > > >> morning.
> > > > > > > >>
> > > > > > > >> And as @chun mentioned , it seems similar to DRILL-4708 , so
> > any
> > > > > > update
> > > > > > > >> on progress of this ticket?
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> On 22-Dec-2016 12:13 AM, "Sudheesh Katkam" <
> > > skat...@maprtech.com>
> > > > > > > wrote:
> > > > > > > >>
> > > > > > > >> Two more questions..
> > > > > > > >>
> > > > > > > >> (1) How many nodes in your cluster?
> > > > > > > >> (2) How many queries are running when the failure is seen?
> > > > > > > >>
> > > > > > > >> If you have multiple large queries running at the same time,
> > the
> > > > > load
> > > > > > on
> > > > > > > >> the system could cause those failures (which are heartbeat
> > > > related).
> > > > > > > >>
> > > > > > > >> The two options I suggested decrease the parallelism of
> stages
> > > in
> > > > a
> > > > > > > >> query, this implies lesser load but slower execution.
> > > > > > > >>
> > > > > > > >> System level option affect all queries, and session level
> > affect
> > > > > > queries
> > > > > > > >> on a specific connection. Not sure what is preferred in your
> > > > > > > environment.
> > > > > > > >>
> > > > > > > >> Also, you may be interested in metrics. More info here:
> > > > > > > >>
> > > > > > > >> http://drill.apache.org/docs/monitoring-metrics/ <
> > > > > > > >> http://drill.apache.org/docs/monitoring-metrics/>
> > > > > > > >>
> > > > > > > >> Thank you,
> > > > > > > >> Sudheesh
> > > > > > > >>
> > > > > > > >> > On Dec 21, 2016, at 4:31 AM, Anup Tiwari <
> > > > > anup.tiw...@games24x7.com
> > > > > > >
> > > > > > > >> wrote:
> > > > > > > >> >
> > > > > > > >> > @sudheesh, yes drill bit is running on
> > > > datanodeN/10.*.*.5:31010).
> > > > > > > >> >
> > > > > > > >> > Can you tell me how this will impact to query and do i
> have
> > to
> > > > set
> > > > > > > this
> > > > > > > >> at
> > > > > > > >> > session level OR system level?
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> > Regards,
> > > > > > > >> > *Anup Tiwari*
> > > > > > > >> >
> > > > > > > >> > On Tue, Dec 20, 2016 at 11:59 PM, Chun Chang <
> > > > cch...@maprtech.com
> > > > > >
> > > > > > > >> wrote:
> > > > > > > >> >
> > > > > > > >> >> I am pretty sure this is the same as DRILL-4708.
> > > > > > > >> >>
> > > > > > > >> >> On Tue, Dec 20, 2016 at 10:27 AM, Sudheesh Katkam <
> > > > > > > >> skat...@maprtech.com>
> > > > > > > >> >> wrote:
> > > > > > > >> >>
> > > > > > > >> >>> Is the drillbit service (running on
> > > datanodeN/10.*.*.5:31010)
> > > > > > > actually
> > > > > > > >> >>> down when the error is seen?
> > > > > > > >> >>>
> > > > > > > >> >>> If not, try lowering parallelism using these two session
> > > > > options,
> > > > > > > >> before
> > > > > > > >> >>> running the queries:
> > > > > > > >> >>>
> > > > > > > >> >>> planner.width.max_per_node (decrease this)
> > > > > > > >> >>> planner.slice_target (increase this)
> > > > > > > >> >>>
> > > > > > > >> >>> Thank you,
> > > > > > > >> >>> Sudheesh
> > > > > > > >> >>>
> > > > > > > >> >>>> On Dec 20, 2016, at 12:28 AM, Anup Tiwari <
> > > > > > > anup.tiw...@games24x7.com
> > > > > > > >> >
> > > > > > > >> >>> wrote:
> > > > > > > >> >>>>
> > > > > > > >> >>>> Hi Team,
> > > > > > > >> >>>>
> > > > > > > >> >>>> We are running some drill automation script on a daily
> > > basis
> > > > > and
> > > > > > we
> > > > > > > >> >> often
> > > > > > > >> >>>> see that some query gets failed frequently by giving
> > below
> > > > > error
> > > > > > ,
> > > > > > > >> >> Also i
> > > > > > > >> >>>> came across DRILL-4708 <https://issues.apache.org/
> > > > > > > >> >> jira/browse/DRILL-4708
> > > > > > > >> >>>>
> > > > > > > >> >>>> which seems similar, Can anyone give me update on that
> OR
> > > > > > > workaround
> > > > > > > >> to
> > > > > > > >> >>>> avoid such issue ?
> > > > > > > >> >>>>
> > > > > > > >> >>>> *Stack Trace :-*
> > > > > > > >> >>>>
> > > > > > > >> >>>> Error: CONNECTION ERROR: Connection /10.*.*.1:41613
> <-->
> > > > > > > >> >>>> datanodeN/10.*.*.5:31010 (user client) closed
> > unexpectedly.
> > > > > > > Drillbit
> > > > > > > >> >>> down?
> > > > > > > >> >>>>
> > > > > > > >> >>>>
> > > > > > > >> >>>> [Error Id: 5089f2f1-0dfd-40f8-9fa0-8276c08be53f ]
> > > > > > (state=,code=0)
> > > > > > > >> >>>> java.sql.SQLException: CONNECTION ERROR: Connection
> > > > > > /10.*.*.1:41613
> > > > > > > >> >> <-->
> > > > > > > >> >>>> datanodeN/10.*.*.5:31010 (user client) closed
> > unexpectedly.
> > > > > > Drillb
> > > > > > > >> >>>> it down?
> > > > > > > >> >>>>
> > > > > > > >> >>>>
> > > > > > > >> >>>> [Error Id: 5089f2f1-0dfd-40f8-9fa0-8276c08be53f ]
> > > > > > > >> >>>>       at
> > > > > > > >> >>>> org.apache.drill.jdbc.impl.
> > DrillCursor.nextRowInternally(
> > > > > > > >> >>> DrillCursor.java:232)
> > > > > > > >> >>>>       at
> > > > > > > >> >>>> org.apache.drill.jdbc.impl.
> > DrillCursor.loadInitialSchema(
> > > > > > > >> >>> DrillCursor.java:275)
> > > > > > > >> >>>>       at
> > > > > > > >> >>>> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(
> > > > > > > >> >>> DrillResultSetImpl.java:1943)
> > > > > > > >> >>>>       at
> > > > > > > >> >>>> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(
> > > > > > > >> >>> DrillResultSetImpl.java:76)
> > > > > > > >> >>>>       at
> > > > > > > >> >>>> org.apache.calcite.avatica.
> AvaticaConnection$1.execute(
> > > > > > > >> >>> AvaticaConnection.java:473)
> > > > > > > >> >>>>       at
> > > > > > > >> >>>> org.apache.drill.jdbc.impl.DrillMetaImpl.
> > > prepareAndExecute(
> > > > > > > >> >>> DrillMetaImpl.java:465)
> > > > > > > >> >>>>       at
> > > > > > > >> >>>> org.apache.calcite.avatica.AvaticaConnection.
> > > > > > > >> >> prepareAndExecuteInternal(
> > > > > > > >> >>> AvaticaConnection.java:477)
> > > > > > > >> >>>>       at
> > > > > > > >> >>>> org.apache.drill.jdbc.impl.DrillConnectionImpl.
> > > > > > > >> >>> prepareAndExecuteInternal(DrillConnectionImpl.java:169)
> > > > > > > >> >>>>       at
> > > > > > > >> >>>> org.apache.calcite.avatica.AvaticaStatement.
> > > executeInternal(
> > > > > > > >> >>> AvaticaStatement.java:109)
> > > > > > > >> >>>>       at
> > > > > > > >> >>>> org.apache.calcite.avatica.AvaticaStatement.execute(
> > > > > > > >> >>> AvaticaStatement.java:121)
> > > > > > > >> >>>>       at
> > > > > > > >> >>>> org.apache.drill.jdbc.impl.DrillStatementImpl.execute(
> > > > > > > >> >>> DrillStatementImpl.java:101)
> > > > > > > >> >>>>       at sqlline.Commands.execute(Commands.java:841)
> > > > > > > >> >>>>       at sqlline.Commands.sql(Commands.java:751)
> > > > > > > >> >>>>       at sqlline.SqlLine.dispatch(SqlLine.java:746)
> > > > > > > >> >>>>       at sqlline.SqlLine.runCommands(
> SqlLine.java:1651)
> > > > > > > >> >>>>       at sqlline.Commands.run(Commands.java:1304)
> > > > > > > >> >>>>       at sun.reflect.NativeMethodAccessorImpl.
> > > invoke0(Native
> > > > > > > Method)
> > > > > > > >> >>>>       at
> > > > > > > >> >>>> sun.reflect.NativeMethodAccessorImpl.invoke(
> > > > > > > >> >>> NativeMethodAccessorImpl.java:62)
> > > > > > > >> >>>>       at
> > > > > > > >> >>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(
> > > > > > > >> >>> DelegatingMethodAccessorImpl.java:43)
> > > > > > > >> >>>>       at java.lang.reflect.Method.
> > invoke(Method.java:498)
> > > > > > > >> >>>>       at
> > > > > > > >> >>>> sqlline.ReflectiveCommandHandler.execute(
> > > > > > > >> >> ReflectiveCommandHandler.java:
> > > > > > > >> >>> 36)
> > > > > > > >> >>>>       at sqlline.SqlLine.dispatch(SqlLine.java:742)
> > > > > > > >> >>>>       at sqlline.SqlLine.initArgs(SqlLine.java:553)
> > > > > > > >> >>>>       at sqlline.SqlLine.begin(SqlLine.java:596)
> > > > > > > >> >>>>       at sqlline.SqlLine.start(SqlLine.java:375)
> > > > > > > >> >>>>       at sqlline.SqlLine.main(SqlLine.java:268)
> > > > > > > >> >>>> Caused by: org.apache.drill.common.
> > > exceptions.UserException:
> > > > > > > >> >> CONNECTION
> > > > > > > >> >>>> ERROR: Connection /10.*.*.1:41613 <-->
> > > > datanodeN/10.*.*.5:31010
> > > > > > > (user
> > > > > > > >> >>>> client) closed unexpectedly. Drillbit down?
> > > > > > > >> >>>>
> > > > > > > >> >>>>
> > > > > > > >> >>>> [Error Id: 5089f2f1-0dfd-40f8-9fa0-8276c08be53f ]
> > > > > > > >> >>>>       at
> > > > > > > >> >>>> org.apache.drill.common.exceptions.UserException$
> > > > > > > >> >>> Builder.build(UserException.java:543)
> > > > > > > >> >>>>       at
> > > > > > > >> >>>> org.apache.drill.exec.rpc.user.QueryResultHandler$
> > > > > > > >> >>> ChannelClosedHandler$1.operationComplete(
> > > QueryResultHandler.
> > > > > > > java:373)
> > > > > > > >> >>>>       at
> > > > > > > >> >>>> io.netty.util.concurrent.DefaultPromise.
> notifyListener0(
> > > > > > > >> >>> DefaultPromise.java:680)
> > > > > > > >> >>>>       at
> > > > > > > >> >>>> io.netty.util.concurrent.DefaultPromise.
> > notifyListeners0(
> > > > > > > >> >>> DefaultPromise.java:603)
> > > > > > > >> >>>>       at
> > > > > > > >> >>>> io.netty.util.concurrent.DefaultPromise.
> notifyListeners(
> > > > > > > >> >>> DefaultPromise.java:563)
> > > > > > > >> >>>>       at
> > > > > > > >> >>>> io.netty.util.concurrent.DefaultPromise.trySuccess(
> > > > > > > >> >>> DefaultPromise.java:406)
> > > > > > > >> >>>>       at
> > > > > > > >> >>>> io.netty.channel.DefaultChannelPromise.trySuccess(
> > > > > > > >> >>> DefaultChannelPromise.java:82)
> > > > > > > >> >>>>       at
> > > > > > > >> >>>> io.netty.channel.AbstractChannel$CloseFuture.
> > > > > > > >> >> setClosed(AbstractChannel.
> > > > > > > >> >>> java:943)
> > > > > > > >> >>>>       at
> > > > > > > >> >>>> io.netty.channel.AbstractChannel$
> > AbstractUnsafe.doClose0(
> > > > > > > >> >>> AbstractChannel.java:592)
> > > > > > > >> >>>>       at
> > > > > > > >> >>>> io.netty.channel.AbstractChannel$AbstractUnsafe.close(
> > > > > > > >> >>> AbstractChannel.java:584)
> > > > > > > >> >>>>       at
> > > > > > > >> >>>> io.netty.channel.nio.AbstractNioByteChannel$
> > > NioByteUnsafe.cl
> > > > > > > >> oseOnRead(
> > > > > > > >> >>> AbstractNioByteChannel.java:71)
> > > > > > > >> >>>>       at
> > > > > > > >> >>>> io.netty.channel.nio.AbstractNioByteChannel$
> > NioByteUnsafe.
> > > > > > > >> >>> handleReadException(AbstractNioByteChannel.java:89)
> > > > > > > >> >>>>       at
> > > > > > > >> >>>> io.netty.channel.nio.AbstractNioByteChannel$
> > > > > NioByteUnsafe.read(
> > > > > > > >> >>> AbstractNioByteChannel.java:162)
> > > > > > > >> >>>>       at
> > > > > > > >> >>>> io.netty.channel.nio.NioEventLoop.processSelectedKey(
> > > > > > > >> >>> NioEventLoop.java:511)
> > > > > > > >> >>>>       at
> > > > > > > >> >>>> io.netty.channel.nio.NioEventLoop.
> > > > > processSelectedKeysOptimized(
> > > > > > > >> >>> NioEventLoop.java:468)
> > > > > > > >> >>>>       at
> > > > > > > >> >>>> io.netty.channel.nio.NioEventLoop.processSelectedKeys(
> > > > > > > >> >>> NioEventLoop.java:382)
> > > > > > > >> >>>>       at io.netty.channel.nio.NioEventL
> > > > > > > >> oop.run(NioEventLoop.java:354)
> > > > > > > >> >>>>       at
> > > > > > > >> >>>> io.netty.util.concurrent.SingleThreadEventExecutor$2.
> > > > > > > >> >>> run(SingleThreadEventExecutor.java:111)
> > > > > > > >> >>>>       at java.lang.Thread.run(Thread.java:745)
> > > > > > > >> >>>>
> > > > > > > >> >>>>
> > > > > > > >> >>>> Regards,
> > > > > > > >> >>>> *Anup Tiwari*
> > > > > > > >> >>>
> > > > > > > >> >>>
> > > > > > > >> >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to