> Also has there been simple benchmarks to compare: > > 1. Hive on MR > 2. Hine on Tez > 3. Hive on Tez with LLAP
I ran one today, with a small BI query in my test suite against a 1Tb data-set. TL;DR - MRv2 (203.317 seconds), Tez (13.681s), LLAP (3.809s). *Warning*: This is not a historical view, all engines are using the same new & improved vectorized operators from 2.2.0-SNAPSHOT, only the physical planner and the physical scheduling is different between runs. The difference between pre-Stinger, Stinger and Stinger.next is much much larger than this. <https://github.com/hortonworks/hive-testbench/blob/hive14/sample-queries-t pcds/query55.sql> select i_brand_id brand_id, i_brand brand, sum(ss_ext_sales_price) ext_price from date_dim, store_sales, item where date_dim.d_date_sk = store_sales.ss_sold_date_sk and store_sales.ss_item_sk = item.i_item_sk and i_manager_id=36 and d_moy=12 and d_year=2001 group by i_brand, i_brand_id order by ext_price desc, i_brand_id limit 100 ; =================MRv2============== set hive.execution.engine=mr; ... 2016-07-18 22:22:57 Uploaded 1 File to: file:/tmp/gopal/b58a60d6-ff05-47bc-ad02-428aaa15779d/hive_2016-07-18_22-22- 43_389_3112118969207749230-1/-local-10007/HashTable-Stage-3/MapJoin-mapfile 131--.hashtable (914 bytes) 2016-07-18 22:22:57 End of local task; Time Taken: 2.47 sec. ... Time taken: 203.317 seconds, Fetched: 100 row(s) =================Tez=============== set hive.execution.engine=tez; set hive.llap.execution.mode=none; Time taken: 13.681 seconds, Fetched: 100 row(s) =================LLAP============== set hive.llap.execution.mode=all; Task Execution Summary --------------------------------------------------------------------------- ------------------- VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms) INPUT_RECORDS OUTPUT_RECORDS --------------------------------------------------------------------------- ------------------- Map 1 1016.00 0 0 93,123,704 9,048 Map 4 0.00 0 0 10,000 31 Map 5 0.00 0 0 296,344 2,675 Reducer 2 207.00 0 0 9,048 100 Reducer 3 0.00 0 0 100 0 --------------------------------------------------------------------------- ------------------- Query Execution Summary --------------------------------------------------------------------------- ------------------- OPERATION DURATION --------------------------------------------------------------------------- ------------------- Compile Query 1.64s Prepare Plan 0.32s Submit Plan 0.57s Start DAG 0.21s Run DAG 1.02s --------------------------------------------------------------------------- ------------------- Time taken: 3.809 seconds, Fetched: 100 row(s) Annoyingly now, the 1.64s to compile the query is a huge fraction, since it only takes 1.02s to execute the join+aggregate over 93 million rows. Hopefully in a couple of weeks, we'll cut that 1.64s into nearly nothing once we merge HIVE-13995 into master. More about the historical view, the new Vectorization codepaths are a big part of this speed up, when you compare historically or against an incompletely vectorized format like Parquet (HIVE-8128 looks abandoned). set hive.vectorized.execution.mapjoin.native.enabled=false; Time taken: 34.372 seconds, Fetched: 100 row(s) hive> Cheers, Gopal