The plan itself may have a hint as to why it took so long. One reason is if there is a very large number of files and Drill is reading file metadata for every file during the planning stage. This operation is not distributed and can sometimes become a bottleneck.
On Fri, Jul 1, 2016 at 10:44 AM, John Omernik <[email protected]> wrote: > Yes, the planning is taking a long time. That is the issue. > > So, when queuing is enabled. "Planning" happens when the status is > ENQUEUED. IF Queuing is not enabled, "Planning" happens when the status is > "STARTING". (Based on my observations). > > Are there any good docs, or sources to look at why the planning phase may > be taking so long? > > Thanks! > > John > > > On Fri, Jul 1, 2016 at 12:13 PM, Abdel Hakim Deneche < > [email protected]> > wrote: > > > Most likely planing is taking longer to finish. Once it's done, it should > > move to either ENQUEUED if the queuing was enabled or RUNNING if it was > > disabled. > > > > One easy way to confirm if planing is indeed taking too long is to just > run > > a "EXPLAIN PLAN FOR <query>" and see how long it takes to finish. > > > > On Fri, Jul 1, 2016 at 6:49 AM, John Omernik <[email protected]> wrote: > > > > > Interestingly enough, when I disable queuing, the query sits in the > > > "STARTING" phase for the same amount of time it would sit in ENQUEUING > if > > > queuing was enabled. Excessive planning? > > > > > > When looking at the UI, how can I validate this? > > > > > > > > > > > > On Fri, Jul 1, 2016 at 8:14 AM, John Omernik <[email protected]> wrote: > > > > > > > I don't see that, but here's a question, when it's enqueued, it must > > have > > > > to do some level of planning before determining which queue it's > going > > to > > > > fall into ... correct? I wonder if that planning takes to long, if > > > that's > > > > what's causing the enqueued state? > > > > > > > > > > > > > > > > On Thu, Jun 30, 2016 at 1:09 PM, Parth Chandra < > [email protected]> > > > > wrote: > > > > > > > >> The queue that the queries are put in is determined by the cost > > > calculated > > > >> by the optimizer. So in Qiang's case, it might be that the cost > > > >> calculation > > > >> might be causing the query to be put in the large query queue. > > > >> > > > >> You can check the cost of the query in the query profile and compare > > > with > > > >> the value of the QUEUE_THRESHOLD_SIZE setting (exec.queue.threshold) > > to > > > >> see > > > >> which queue the query is being put in. > > > >> > > > >> A single query staying enqueued for 30 seconds sounds really wrong. > > > >> Putting > > > >> a query in either queue requires getting a distributed semaphore > (via > > > >> zookeeper) and it is possible this is taking too long which is why > the > > > >> enqueuing may be taking really long. > > > >> > > > >> Do you see any messages in the logs about timeouts while enqueuing? > > > >> > > > >> > > > >> > > > >> > > > >> On Thu, Jun 30, 2016 at 6:46 AM, John Omernik <[email protected]> > > wrote: > > > >> > > > >> > Thanks Parth. > > > >> > As I stated in, there are no other jobs running in the cluster > when > > > >> this > > > >> > happens. I do have queueing enabled, however, with no other jobs > > > >> running, > > > >> > why would any single job sit in the ENQUEUED state for 30 seconds? > > > This > > > >> > seems to be an issue or am I missing something? > > > >> > > > > >> > I would really like to use queueing as this is a multi-tenant > > cluster, > > > >> so I > > > >> > don't want to remove it all together. > > > >> > > > > >> > John > > > >> > > > > >> > On Wed, Jun 29, 2016 at 10:57 PM, qiang li <[email protected]> > > > >> wrote: > > > >> > > > > >> > > I have the same doult. > > > >> > > > > > >> > > I set the queue.threshold to 50000000, queue.large to 20 and the > > > >> > > queue.small to 200. But when I query with about 100 small querys > > > >> > > concurrently, most of them are ENQUEUED. > > > >> > > > > > >> > > If I turn off the queue, it will query fast. If turn on the > queue > > , > > > >> our > > > >> > > querys will speed about 7 seconds, while only take 2 to 3 > seconds > > > if I > > > >> > turn > > > >> > > off queue. > > > >> > > > > > >> > > Currently , we turn off the queue and limit the querys at client > > > side. > > > >> > > > > > >> > > 2016-06-30 6:19 GMT+08:00 Parth Chandra <[email protected] > >: > > > >> > > > > > >> > > > I would guess you have queueing enabled. With queueing > enabled, > > > >> only a > > > >> > > max > > > >> > > > number of queries will be actually running and the rest will > > wait > > > >> in an > > > >> > > > ENQUEUED state. > > > >> > > > > > > >> > > > There are two queues: one for large queries and one for small > > > >> queries. > > > >> > > You > > > >> > > > can change their size with the following parameters - > > > >> > > > exec.queue.large > > > >> > > > exec.queue.small > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > On Wed, Jun 29, 2016 at 1:51 PM, John Omernik < > [email protected] > > > > > > >> > wrote: > > > >> > > > > > > >> > > > > I have some jobs that will stay in an ENQUEUED state for > what > > I > > > >> think > > > >> > > to > > > >> > > > be > > > >> > > > > an excessive amount of time. (No other jobs running on the > > > >> cluster, > > > >> > > the > > > >> > > > > ENQUEUED state lasted for 30 seconds) . What causes this? Is > > it > > > >> > > planning > > > >> > > > > when it's in this state? Any information about this would be > > > >> helpful. > > > >> > > > > > > > >> > > > > John > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > -- > > > > Abdelhakim Deneche > > > > Software Engineer > > > > <http://www.mapr.com/> > > > > > > Now Available - Free Hadoop On-Demand Training > > < > > > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > > > > >
