Yeah, it seems that way. We should get your patch merged. I just reviewed and lgtm.
What type of workload are you running? Unless your workload is planning heavy (e.g. lots of short queries) or does a lot of sorts (the last merge is on the foreman node), work should be reasonably distributed. On Sun, Apr 12, 2015 at 10:29 PM, Adam Gilmore <[email protected]> wrote: > Looks like this definitely is the following bug: > > https://issues.apache.org/jira/browse/DRILL-2512 > > It's a pretty severe performance bottleneck having the foreman doing so > much work. In our environment, the foreman hits basically 95-100% CPU > while the other drillbits barely do much work. Means it's nearly > impossible for us to scale out. > > On Wed, Apr 8, 2015 at 3:58 PM, Adam Gilmore <[email protected]> > wrote: > > > Anyone have any more thoughts on this? Anywhere I can start trying to > > troubleshoot? > > > > On Thu, Mar 26, 2015 at 4:13 PM, Adam Gilmore <[email protected]> > > wrote: > > > >> So there are 5 Parquet files, each ~125mb - not sure what I can provide > >> re the block locations? I believe it's under the HDFS block size so > they > >> should be stored contiguously. > >> > >> I've tried setting the affinity factor to various values (1, 0, etc.) > but > >> nothing seems to change that. It always prefers certain nodes. > >> > >> Moreover, we added a stack more nodes and it started picking very > >> specific nodes as foremen (perhaps 2-3 nodes out of 20 were always > picked > >> as foremen). Therefore, the foremen were being swamped with CPU while > the > >> other nodes were doing very little work. > >> > >> On Thu, Mar 26, 2015 at 12:12 PM, Steven Phillips < > [email protected] > >> > wrote: > >> > >>> Actually, I believe a query submitted through REST interface will > >>> instantiate a DrillClient, which uses the same ZKClusterCoordinator > that > >>> sqlline uses, and thus the foreman for the query is not necessarily on > >>> the > >>> same drillbit as it was submitted to. But I'm still not sure it's > related > >>> to DRILL-2512. > >>> > >>> I'll wait for your additional info before speculating further. > >>> > >>> On Wed, Mar 25, 2015 at 6:54 PM, Adam Gilmore <[email protected]> > >>> wrote: > >>> > >>> > We actually setup a separate load balancer for port 8047 (we're > >>> submitting > >>> > these queries via the REST API at the moment) so Zookeeper etc. is > out > >>> of > >>> > the equation, thus I doubt we're hitting DRILL-2512. > >>> > > >>> > When shutitng down the "troublesome" drillbit, it starts > parallelizing > >>> much > >>> > nicer again. We even added 10+ nodes to the cluster and as long as > >>> that > >>> > particular drillbit is shut down, it distributes very nicely. The > >>> minute > >>> > we start the drillbit on that node again, it starts swamping it with > >>> work. > >>> > > >>> > I'll shoot through the JSON profiles and some more information on the > >>> > dataset etc. later today (Australian time!). > >>> > > >>> > On Thu, Mar 26, 2015 at 5:31 AM, Steven Phillips < > >>> [email protected]> > >>> > wrote: > >>> > > >>> > > I didn't notice at first that Adam said "no matter who the foreman > >>> is". > >>> > > > >>> > > Another suspicion I have is that our current logic for assigning > work > >>> > will > >>> > > assign to the exact same nodes every time we query a particular > >>> table. > >>> > > Changing affinity factor may change it, but it will still be the > same > >>> > every > >>> > > time. That is my suspicion, but I am not sure why shutting down the > >>> > > drillbit would improve performance. I would expect that shutting > >>> down the > >>> > > drillbit would result in a different drillbit becoming the hotspot. > >>> > > > >>> > > On Wed, Mar 25, 2015 at 12:16 PM, Jacques Nadeau < > [email protected] > >>> > > >>> > > wrote: > >>> > > > >>> > > > On Steven's point, the node that the client connects to is not > >>> > currently > >>> > > > randomized. Given your description of behavior, I'm not sure > that > >>> > you're > >>> > > > hitting 2512 or just general undesirable distribution. > >>> > > > > >>> > > > On Wed, Mar 25, 2015 at 10:18 AM, Steven Phillips < > >>> > > [email protected]> > >>> > > > wrote: > >>> > > > > >>> > > > > This is a known issue: > >>> > > > > > >>> > > > > https://issues.apache.org/jira/browse/DRILL-2512 > >>> > > > > > >>> > > > > On Wed, Mar 25, 2015 at 8:13 AM, Andries Engelbrecht < > >>> > > > > [email protected]> wrote: > >>> > > > > > >>> > > > > > What version of Drill are you running? > >>> > > > > > > >>> > > > > > Any hints when looking at the query profiles? Is the node > that > >>> is > >>> > > being > >>> > > > > > hammered the foreman for the queries and most of the major > >>> > fragments > >>> > > > are > >>> > > > > > tied to the foreman? > >>> > > > > > > >>> > > > > > —Andries > >>> > > > > > > >>> > > > > > > >>> > > > > > On Mar 25, 2015, at 12:00 AM, Adam Gilmore < > >>> [email protected]> > >>> > > > > wrote: > >>> > > > > > > >>> > > > > > > Hi guys, > >>> > > > > > > > >>> > > > > > > I'm trying to understand how this could be possible. I > have > >>> a > >>> > > Hadoop > >>> > > > > > > cluster of a name node and two data nodes setup. All have > >>> > > identical > >>> > > > > > specs > >>> > > > > > > in terms of CPU/RAM etc. > >>> > > > > > > > >>> > > > > > > The two data nodes have a replicated HDFS setup where I'm > >>> storing > >>> > > > some > >>> > > > > > > Parquet files. > >>> > > > > > > > >>> > > > > > > A Drill cluster (with Zookeeper) is running with Drillbits > >>> on all > >>> > > > three > >>> > > > > > > servers. > >>> > > > > > > > >>> > > > > > > When I submit a query to *any* of the Drillbits, no matter > >>> who > >>> > the > >>> > > > > > foreman > >>> > > > > > > is, one particular data node gets picked to do the vast > >>> majority > >>> > of > >>> > > > the > >>> > > > > > > work. > >>> > > > > > > > >>> > > > > > > We've even added three more task nodes to the cluster and > >>> > > everything > >>> > > > > > still > >>> > > > > > > puts a huge load on one particular server. > >>> > > > > > > > >>> > > > > > > There is nothing unique about this data node. HDFS is > fully > >>> > > > replicated > >>> > > > > > (no > >>> > > > > > > unreplicated blocks) to the other data node. > >>> > > > > > > > >>> > > > > > > I know that Drill tries to get data locality, so I'm > >>> wondering if > >>> > > > this > >>> > > > > is > >>> > > > > > > the cause, but this essentially swamping this data node > with > >>> 100% > >>> > > CPU > >>> > > > > > usage > >>> > > > > > > while leaving the others barely doing any work. > >>> > > > > > > > >>> > > > > > > As soon as we shut down the Drillbit on this data node, > query > >>> > > > > performance > >>> > > > > > > increases significantly. > >>> > > > > > > > >>> > > > > > > Any thoughts on how I can troubleshoot why Drill is picking > >>> that > >>> > > > > > particular > >>> > > > > > > node? > >>> > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > > >>> > > > > -- > >>> > > > > Steven Phillips > >>> > > > > Software Engineer > >>> > > > > > >>> > > > > mapr.com > >>> > > > > > >>> > > > > >>> > > > >>> > > > >>> > > > >>> > > -- > >>> > > Steven Phillips > >>> > > Software Engineer > >>> > > > >>> > > mapr.com > >>> > > > >>> > > >>> > >>> > >>> > >>> -- > >>> Steven Phillips > >>> Software Engineer > >>> > >>> mapr.com > >>> > >> > >> > > >
