
In this sample query

select  i_brand_id brand_id, i_brand brand,
        sum(ss_ext_sales_price) ext_price
*date_dim, store_sales, item * where date_dim.d_date_sk =
        and store_sales.ss_item_sk = item.i_item_sk
        and i_manager_id=36
        and d_moy=12
        and d_year=2001
 group by i_brand, i_brand_id
 order by ext_price desc, i_brand_id
limit 100 ;

What was the type (Parquet, text, ORC etc) and row count for each three
tables above?


Dr Mich Talebzadeh

LinkedIn

On 19 July 2016 at 02:17, Gopal Vijayaraghavan <> wrote:

> > These looks pretty impressive. What execution mode were you running
> >these? Yarn client may be?
> There is no other mode - everything runs on YARN.
> > 53 times
> The factor is actually bigger in actual execution.
> The MRv2 version takes 2.47s to prep a query, while the LLAP version takes
> 1.64s.
> The MRv2 version takes 200.319s to execute the query, while the LLAP
> version takes 1.02s.
> The execution factor is nearly ~200x, but the compile becomes significant
> as you scale down the latencies.
> > My calculations on Hive 2 on Spark 1.3.1
> Not sure where Hive2-on-Spark is going - the last commit to SparkCompiler
> was late last year, before there was a Hive2.
> On the speed front, I'm pretty sure you have got most of the Hive2
> optimizations disabled, even the most basic of the Stinger optimizations
> might be missing for you.
> Check if you have
> set hive.vectorized.execution.enabled=true;
> Some of these new optimizations don't work on H-o-S, because Hive-on-Spark
> does not implement a true broadcast join - instead it uses a
> SparkHashTableSinkOperatorwhich actually writes to HDFS instead of sending
> it directy to the downstream task.
> I don't understand why that is the case instead of RDD brodcast, but that
> prevents the JOIN optimizations which convert the 34 sec query into a 3.8
> sec query from applying to Spark execution.
> A couple of examples would be
> set;
> set hive.vectorized.execution.mapjoin.minmax.enabled=true;
> Those two make easy work of joins in LLAP, particularly semi-joins which
> are common in BI queries.
> Once LLAP is out of tech preview, we can enable most of them by default
> for Tez+LLAP, but that would not mean all of it applies to
> Hive-on-(Spark/MR).
> Getting these new features onto another engine takes active effort from
> the engine's devs.
> Cheers,
> Gopal

