I do not think we currently consider JSON files splittable. If we do treat them as such, it would depend on the file size and the available read locality available on the nodes. Especially with a select * (or a count(*)) query there is nothing to parallelize except for the read operation and a simple aggregation. Spreading a small read throughout the cluster would only guarantee that some of the reads would happen over the wire, only to have the final aggregation to be sent later to the query's head node.
On Fri, Jan 16, 2015 at 3:19 AM, mufy <[email protected]> wrote: > And what would be the best way of ensuring that all the drill-bit nodes > participated in the query execution? > > > --- > Mufeed Usman > My LinkedIn <http://www.linkedin.com/pub/mufeed-usman/28/254/400> | My > Social Cause <http://www.vision2016.org.in/> | My Blogs : LiveJournal > <http://mufeed.livejournal.com> > > > > > On Fri, Jan 16, 2015 at 4:45 PM, Steven Phillips <[email protected]> > wrote: > > > I would guess that for the first run, data had to be read off disk, plus > > code runtime code had to be compiled. Subsequent runs did not need to do > > this, since the data should then be in cache, as well as the compiled > > classes, so the subsequent runs are noticeably faster. Runs 1 - 4 have a > > range of about 1.5 seconds, which seems like an unremarkable amount of > > noise. > > > > On Fri, Jan 16, 2015 at 3:07 AM, mufy <[email protected]> wrote: > > > > > Hello, > > > > > > I was curious to know the possible reason(s) behind the difference in > > > timings observed as shown below: > > > > > > 0: jdbc:drill:zk=> select count(*) from > > > dfs.tmp.`yelp_academic_dataset_review.json`; > > > +------------+ > > > | EXPR$0 | > > > +------------+ > > > | 1125458 | > > > +------------+ > > > 1 row selected (15.214 seconds) > > > > > > 0: jdbc:drill:zk=> select count(*) from > > > dfs.tmp.`yelp_academic_dataset_review.json`; > > > +------------+ > > > | EXPR$0 | > > > +------------+ > > > | 1125458 | > > > +------------+ > > > 1 row selected (12.717 seconds) > > > > > > 0: jdbc:drill:zk=> select count(*) from > > > dfs.tmp.`yelp_academic_dataset_review.json`; > > > +------------+ > > > | EXPR$0 | > > > +------------+ > > > | 1125458 | > > > +------------+ > > > 1 row selected (11.833 seconds) > > > > > > 0: jdbc:drill:zk=> select count(*) from > > > dfs.tmp.`yelp_academic_dataset_review.json`; > > > +------------+ > > > | EXPR$0 | > > > +------------+ > > > | 1125458 | > > > +------------+ > > > 1 row selected (13.298 seconds) > > > > > > 0: jdbc:drill:zk=> select count(*) from > > > dfs.tmp.`yelp_academic_dataset_review.json`; > > > +------------+ > > > | EXPR$0 | > > > +------------+ > > > | 1125458 | > > > +------------+ > > > 1 row selected (12.749 seconds) > > > > > > This was run using MapR Drill 0.7.0 on a 5 node MapR cluster. > > > > > > > > > --- > > > Mufeed Usman > > > My LinkedIn <http://www.linkedin.com/pub/mufeed-usman/28/254/400> | My > > > Social Cause <http://www.vision2016.org.in/> | My Blogs : LiveJournal > > > <http://mufeed.livejournal.com> > > > > > > > > > > > -- > > Steven Phillips > > Software Engineer > > > > mapr.com > > >
