If the query time is mainly spent on query planning, then running
distributed mode in a cluster will not help shorten the latency, as
the query planning is done on Foreman, which is just a single node in
the cluster.

Can you please try to query a mongodb database with fewer collections?
>From the symptom, it seems to be similar to hive query planning issue
reported in DRILL-4127. Initially, Drill tried to pre-load everything
including dbnames, table names. That will make query planning time
proportional  to the # of objects in the store.




On Mon, Feb 29, 2016 at 10:07 PM, Rifat Mahmud <[email protected]> wrote:
> Is there a possibility that using distributed Drill will shorten the latency
>
> On Tue, Mar 1, 2016 at 11:47 AM, Rifat Mahmud <[email protected]> wrote:
>
>> 7 seconds on a simple select * command on a table(collection) containin
>> 158 rows(documents).
>>
>> Yes, the mongodb database I am querying into does have 268 collections.
>>
>> On Tue, Mar 1, 2016 at 11:13 AM, Jacques Nadeau <[email protected]>
>> wrote:
>>
>>> I haven't had a chance to look at the profile in detail yet. Do you see
>>> consistent behavior on multiple queries?
>>>
>>> Does your mongodb happen to have a large number of collections and/or
>>> databases? (Just guessing here).
>>>
>>>
>>>
>>> --
>>> Jacques Nadeau
>>> CTO and Co-Founder, Dremio
>>>
>>> On Sun, Feb 28, 2016 at 8:54 PM, Rifat Mahmud <[email protected]> wrote:
>>>
>>> > Here is the json profile of the query: http://pastebin.com/tqang1Y0
>>> > Attached the screen shot of the query profile web view too.
>>> > I am using everything in default configuration for apache-drill-1.5.0,
>>> > just changed the MongoDB location from localhost to the remote IP in the
>>> > storage configuration.
>>> >
>>> > On Sun, Feb 28, 2016 at 3:16 PM, Jacques Nadeau <[email protected]>
>>> > wrote:
>>> >
>>> >> Can you share what the profile looks like? Where is the time being
>>> spent?
>>> >> Unless something is really wrong (or these are gigabyte sized records),
>>> >> I'm
>>> >> guessing there is a configuration issue or bug you are hitting.
>>> >>
>>> >>
>>> >> --
>>> >> Jacques Nadeau
>>> >> CTO and Co-Founder, Dremio
>>> >>
>>> >> On Sun, Feb 28, 2016 at 12:04 AM, Rifat Mahmud <[email protected]>
>>> wrote:
>>> >>
>>> >> > I am running embedded drill on a single 8 core, 16 GB RAM machine. I
>>> am
>>> >> > performing a join query(select * from t1, t2 where t1.a = t2.b) on a
>>> >> remote
>>> >> > MongoDB database. The tables(collections) contain 2 and 4
>>> >> rows(documents)
>>> >> > only. The query is taking 27 seconds.
>>> >> > Can the query be made faster by using drill with Zookeeper cluster?
>>> And
>>> >> if
>>> >> > the answer is yes, by how many factors? Please, elaborate about the
>>> >> > real-timeliness of Apache Drill.
>>> >> >
>>> >>
>>> >
>>> >
>>>
>>
>>

Reply via email to