If the query time is mainly spent on query planning, then running distributed mode in a cluster will not help shorten the latency, as the query planning is done on Foreman, which is just a single node in the cluster.
Can you please try to query a mongodb database with fewer collections? >From the symptom, it seems to be similar to hive query planning issue reported in DRILL-4127. Initially, Drill tried to pre-load everything including dbnames, table names. That will make query planning time proportional to the # of objects in the store. On Mon, Feb 29, 2016 at 10:07 PM, Rifat Mahmud <[email protected]> wrote: > Is there a possibility that using distributed Drill will shorten the latency > > On Tue, Mar 1, 2016 at 11:47 AM, Rifat Mahmud <[email protected]> wrote: > >> 7 seconds on a simple select * command on a table(collection) containin >> 158 rows(documents). >> >> Yes, the mongodb database I am querying into does have 268 collections. >> >> On Tue, Mar 1, 2016 at 11:13 AM, Jacques Nadeau <[email protected]> >> wrote: >> >>> I haven't had a chance to look at the profile in detail yet. Do you see >>> consistent behavior on multiple queries? >>> >>> Does your mongodb happen to have a large number of collections and/or >>> databases? (Just guessing here). >>> >>> >>> >>> -- >>> Jacques Nadeau >>> CTO and Co-Founder, Dremio >>> >>> On Sun, Feb 28, 2016 at 8:54 PM, Rifat Mahmud <[email protected]> wrote: >>> >>> > Here is the json profile of the query: http://pastebin.com/tqang1Y0 >>> > Attached the screen shot of the query profile web view too. >>> > I am using everything in default configuration for apache-drill-1.5.0, >>> > just changed the MongoDB location from localhost to the remote IP in the >>> > storage configuration. >>> > >>> > On Sun, Feb 28, 2016 at 3:16 PM, Jacques Nadeau <[email protected]> >>> > wrote: >>> > >>> >> Can you share what the profile looks like? Where is the time being >>> spent? >>> >> Unless something is really wrong (or these are gigabyte sized records), >>> >> I'm >>> >> guessing there is a configuration issue or bug you are hitting. >>> >> >>> >> >>> >> -- >>> >> Jacques Nadeau >>> >> CTO and Co-Founder, Dremio >>> >> >>> >> On Sun, Feb 28, 2016 at 12:04 AM, Rifat Mahmud <[email protected]> >>> wrote: >>> >> >>> >> > I am running embedded drill on a single 8 core, 16 GB RAM machine. I >>> am >>> >> > performing a join query(select * from t1, t2 where t1.a = t2.b) on a >>> >> remote >>> >> > MongoDB database. The tables(collections) contain 2 and 4 >>> >> rows(documents) >>> >> > only. The query is taking 27 seconds. >>> >> > Can the query be made faster by using drill with Zookeeper cluster? >>> And >>> >> if >>> >> > the answer is yes, by how many factors? Please, elaborate about the >>> >> > real-timeliness of Apache Drill. >>> >> > >>> >> >>> > >>> > >>> >> >>
