Created an issue: https://issues.apache.org/jira/browse/DRILL-4462 It seems like number of collection doesn't matter. The query takes same amount of time(around 27 seconds) on database containing only 2 collection.
On Tue, Mar 1, 2016 at 10:17 PM, Jacques Nadeau <[email protected]> wrote: > If the time is all spent in planning, that won't have much impact as query > planning is not distributed. > > It sounds like we may be taking a while to plan queries against Mongo when > there are a large number of collections. Can you open a JIRA that we can > take a look at? > > -- > Jacques Nadeau > CTO and Co-Founder, Dremio > > On Mon, Feb 29, 2016 at 10:07 PM, Rifat Mahmud <[email protected]> wrote: > > > Is there a possibility that using distributed Drill will shorten the > > latency > > > > On Tue, Mar 1, 2016 at 11:47 AM, Rifat Mahmud <[email protected]> wrote: > > > > > 7 seconds on a simple select * command on a table(collection) containin > > > 158 rows(documents). > > > > > > Yes, the mongodb database I am querying into does have 268 collections. > > > > > > On Tue, Mar 1, 2016 at 11:13 AM, Jacques Nadeau <[email protected]> > > > wrote: > > > > > >> I haven't had a chance to look at the profile in detail yet. Do you > see > > >> consistent behavior on multiple queries? > > >> > > >> Does your mongodb happen to have a large number of collections and/or > > >> databases? (Just guessing here). > > >> > > >> > > >> > > >> -- > > >> Jacques Nadeau > > >> CTO and Co-Founder, Dremio > > >> > > >> On Sun, Feb 28, 2016 at 8:54 PM, Rifat Mahmud <[email protected]> > > wrote: > > >> > > >> > Here is the json profile of the query: http://pastebin.com/tqang1Y0 > > >> > Attached the screen shot of the query profile web view too. > > >> > I am using everything in default configuration for > apache-drill-1.5.0, > > >> > just changed the MongoDB location from localhost to the remote IP in > > the > > >> > storage configuration. > > >> > > > >> > On Sun, Feb 28, 2016 at 3:16 PM, Jacques Nadeau <[email protected] > > > > >> > wrote: > > >> > > > >> >> Can you share what the profile looks like? Where is the time being > > >> spent? > > >> >> Unless something is really wrong (or these are gigabyte sized > > records), > > >> >> I'm > > >> >> guessing there is a configuration issue or bug you are hitting. > > >> >> > > >> >> > > >> >> -- > > >> >> Jacques Nadeau > > >> >> CTO and Co-Founder, Dremio > > >> >> > > >> >> On Sun, Feb 28, 2016 at 12:04 AM, Rifat Mahmud <[email protected]> > > >> wrote: > > >> >> > > >> >> > I am running embedded drill on a single 8 core, 16 GB RAM > machine. > > I > > >> am > > >> >> > performing a join query(select * from t1, t2 where t1.a = t2.b) > on > > a > > >> >> remote > > >> >> > MongoDB database. The tables(collections) contain 2 and 4 > > >> >> rows(documents) > > >> >> > only. The query is taking 27 seconds. > > >> >> > Can the query be made faster by using drill with Zookeeper > cluster? > > >> And > > >> >> if > > >> >> > the answer is yes, by how many factors? Please, elaborate about > the > > >> >> > real-timeliness of Apache Drill. > > >> >> > > > >> >> > > >> > > > >> > > > >> > > > > > > > > >
