How did you create the tables? Do you have automated statistics activated in Hive?
Btw mr is outdated as a Hive execution engine. Use TEZ (maybe wait for 0.8 for sub second queries ) or use Spark as an execution engine in Hive. > On 01 Dec 2015, at 17:40, Mich Talebzadeh <m...@peridale.co.uk> wrote: > > What if we decide to use spark with Hive. I look to hear similar views > > My test bed comprised > > 1. Spark version 1.5.2 > 2. Hive version 1.2.1 > 3. Hadoop version 2.6 > > > I made Spark to use Hive metastore. So using spark-sql I can pretty do > whatever one can do with HiveQL > > I created and populated an ORC table in spark-sql.. It took 90 seconds to > create and populate the table with 1.7 million rows > > spark-sql> select count(1) from tt; > 1767886 > Time taken: 5.169 seconds, Fetched 1 row(s) > > Now let me try to do the said operation on the same table with HCL and MR > > hive> use asehadoop; > OK > Time taken: 0.639 seconds > hive> select count(1) from tt; > Query ID = hduser_20151201162717_e3102633-f501-413b-b9cb-384ac50880ac > Total jobs = 1 > Launching Job 1 out of 1 > Number of reduce tasks determined at compile time: 1 > In order to change the average load for a reducer (in bytes): > set hive.exec.reducers.bytes.per.reducer=<number> > In order to limit the maximum number of reducers: > set hive.exec.reducers.max=<number> > In order to set a constant number of reducers: > set mapreduce.job.reduces=<number> > Starting Job = job_1448969636093_0001, Tracking URL = > http://rhes564:8088/proxy/application_1448969636093_0001/ > Kill Command = /home/hduser/hadoop-2.6.0/bin/hadoop job -kill > job_1448969636093_0001 > Hadoop job information for Stage-1: number of mappers: 1; number of reducers: > 1 > 2015-12-01 16:27:27,154 Stage-1 map = 0%, reduce = 0% > 2015-12-01 16:27:35,427 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.65 > sec > 2015-12-01 16:27:41,611 Stage-1 map = 100%, reduce = 100%, Cumulative CPU > 3.71 sec > MapReduce Total cumulative CPU time: 3 seconds 710 msec > Ended Job = job_1448969636093_0001 > MapReduce Jobs Launched: > Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 3.71 sec HDFS Read: > 520151 HDFS Write: 8 SUCCESS > Total MapReduce CPU Time Spent: 3 seconds 710 msec > OK > 1767886 > Time taken: 25.635 seconds, Fetched: 1 row(s) > > So 5 seconds in Spark versus 25 seconds in Hive > > On a point query Hive does not seem to return the correct timing? > > hive> select * from tt where data_object_id = 10; > Time taken: 0.063 seconds, Fetched: 72 row(s) > > Whereas in Spark I get > > spark-sql> select * from tt where data_object_id = 10; > Time taken: 9.002 seconds, Fetched 72 row(s) > > 9 seconds looks far more plausible to me than 0.063 seonds. Or in an unlikely > event Spark returns elapsed time, whereas Hive returns execution time? > > Thanks > > Mich Talebzadeh > > Sybase ASE 15 Gold Medal Award 2008 > A Winning Strategy: Running the most Critical Financial Data on ASE 15 > http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf > Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", > ISBN 978-0-9563693-0-7. > co-author "Sybase Transact SQL Guidelines Best Practices", ISBN > 978-0-9759693-0-4 > Publications due shortly: > Complex Event Processing in Heterogeneous Environments, ISBN: > 978-0-9563693-3-8 > Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume > one out shortly > > http://talebzadehmich.wordpress.com > > NOTE: The information in this email is proprietary and confidential. This > message is for the designated recipient only, if you are not the intended > recipient, you should destroy it immediately. Any information in this message > shall not be understood as given or endorsed by Peridale Technology Ltd, its > subsidiaries or their employees, unless expressly so stated. It is the > responsibility of the recipient to ensure that this email is virus free, > therefore neither Peridale Ltd, its subsidiaries nor their employees accept > any responsibility. >