I would guess that for the first run, data had to be read off disk, plus code runtime code had to be compiled. Subsequent runs did not need to do this, since the data should then be in cache, as well as the compiled classes, so the subsequent runs are noticeably faster. Runs 1 - 4 have a range of about 1.5 seconds, which seems like an unremarkable amount of noise.
On Fri, Jan 16, 2015 at 3:07 AM, mufy <[email protected]> wrote: > Hello, > > I was curious to know the possible reason(s) behind the difference in > timings observed as shown below: > > 0: jdbc:drill:zk=> select count(*) from > dfs.tmp.`yelp_academic_dataset_review.json`; > +------------+ > | EXPR$0 | > +------------+ > | 1125458 | > +------------+ > 1 row selected (15.214 seconds) > > 0: jdbc:drill:zk=> select count(*) from > dfs.tmp.`yelp_academic_dataset_review.json`; > +------------+ > | EXPR$0 | > +------------+ > | 1125458 | > +------------+ > 1 row selected (12.717 seconds) > > 0: jdbc:drill:zk=> select count(*) from > dfs.tmp.`yelp_academic_dataset_review.json`; > +------------+ > | EXPR$0 | > +------------+ > | 1125458 | > +------------+ > 1 row selected (11.833 seconds) > > 0: jdbc:drill:zk=> select count(*) from > dfs.tmp.`yelp_academic_dataset_review.json`; > +------------+ > | EXPR$0 | > +------------+ > | 1125458 | > +------------+ > 1 row selected (13.298 seconds) > > 0: jdbc:drill:zk=> select count(*) from > dfs.tmp.`yelp_academic_dataset_review.json`; > +------------+ > | EXPR$0 | > +------------+ > | 1125458 | > +------------+ > 1 row selected (12.749 seconds) > > This was run using MapR Drill 0.7.0 on a 5 node MapR cluster. > > > --- > Mufeed Usman > My LinkedIn <http://www.linkedin.com/pub/mufeed-usman/28/254/400> | My > Social Cause <http://www.vision2016.org.in/> | My Blogs : LiveJournal > <http://mufeed.livejournal.com> > -- Steven Phillips Software Engineer mapr.com
