> Also has there been simple benchmarks to compare:
> 
> 1. Hive on MR
> 2. Hine on Tez
> 3. Hive on Tez with LLAP

I ran one today, with a small BI query in my test suite against a 1Tb
data-set.

TL;DR - MRv2 (203.317 seconds), Tez (13.681s), LLAP (3.809s).

*Warning*: This is not a historical view, all engines are using the same
new & improved vectorized operators from 2.2.0-SNAPSHOT, only the physical
planner and the physical scheduling is different between runs.

The difference between pre-Stinger, Stinger and Stinger.next is much much
larger than this.

<https://github.com/hortonworks/hive-testbench/blob/hive14/sample-queries-t
pcds/query55.sql>


select  i_brand_id brand_id, i_brand brand,
        sum(ss_ext_sales_price) ext_price
 from date_dim, store_sales, item
 where date_dim.d_date_sk = store_sales.ss_sold_date_sk
        and store_sales.ss_item_sk = item.i_item_sk
        and i_manager_id=36
        and d_moy=12
        and d_year=2001
 group by i_brand, i_brand_id
 order by ext_price desc, i_brand_id
limit 100 ;


=================MRv2==============


set hive.execution.engine=mr;

...
2016-07-18 22:22:57     Uploaded 1 File to:
file:/tmp/gopal/b58a60d6-ff05-47bc-ad02-428aaa15779d/hive_2016-07-18_22-22-
43_389_3112118969207749230-1/-local-10007/HashTable-Stage-3/MapJoin-mapfile
131--.hashtable (914 bytes)

2016-07-18 22:22:57     End of local task; Time Taken: 2.47 sec.
...
Time taken: 203.317 seconds, Fetched: 100 row(s)

=================Tez===============



set hive.execution.engine=tez;
set hive.llap.execution.mode=none;

Time taken: 13.681 seconds, Fetched: 100 row(s)

=================LLAP==============


set hive.llap.execution.mode=all;



Task Execution Summary
---------------------------------------------------------------------------
-------------------
  VERTICES   DURATION(ms)  CPU_TIME(ms)  GC_TIME(ms)  INPUT_RECORDS
OUTPUT_RECORDS
---------------------------------------------------------------------------
-------------------
     Map 1        1016.00             0            0     93,123,704
   9,048
     Map 4           0.00             0            0         10,000
      31
     Map 5           0.00             0            0        296,344
   2,675
 Reducer 2         207.00             0            0          9,048
     100
 Reducer 3           0.00             0            0            100
       0
---------------------------------------------------------------------------
-------------------


Query Execution Summary
---------------------------------------------------------------------------
-------------------
OPERATION                            DURATION
---------------------------------------------------------------------------
-------------------
Compile Query                           1.64s
Prepare Plan                            0.32s
Submit Plan                             0.57s
Start DAG                               0.21s
Run DAG                                 1.02s
---------------------------------------------------------------------------
-------------------


Time taken: 3.809 seconds, Fetched: 100 row(s)


Annoyingly now, the 1.64s to compile the query is a huge fraction, since
it only takes 1.02s to execute the join+aggregate over 93 million rows.

Hopefully in a couple of weeks, we'll cut that 1.64s into nearly nothing
once we merge HIVE-13995 into master.


More about the historical view, the new Vectorization codepaths are a big
part of this speed up, when you compare historically or against an
incompletely vectorized format like Parquet (HIVE-8128 looks abandoned).

set hive.vectorized.execution.mapjoin.native.enabled=false;


Time taken: 34.372 seconds, Fetched: 100 row(s)
hive>


Cheers,
Gopal









Reply via email to