RE: Impact of partitioning on certain queries

Mich Talebzadeh Fri, 08 Jan 2016 01:48:08 -0800

Thanks Gopal.


Basically the following is true:

 

1.    The storage layer is HDFS

2.    The execution engine is MR, Tez, Spark etc

3.    The access layer is Hive

 

When we say the access layer is Hive, is the assumption correct that we are
referring to optimiser (loosly related to the optimiser in RDBMS). For
example is Hive optimiser aware of the number of underlying partitions. The
reason I am asking this question is that with EXPLAIN I only see Table scan
and it does refer to any partition or partition elimination?

 

 

Cheers

 

 

NOTE: The information in this email is proprietary and confidential. This
message is for the designated recipient only, if you are not the intended
recipient, you should destroy it immediately. Any information in this
message shall not be understood as given or endorsed by Peridale Technology
Ltd, its subsidiaries or their employees, unless expressly so stated. It is
the responsibility of the recipient to ensure that this email is virus free,
therefore neither Peridale Ltd, its subsidiaries nor their employees accept
any responsibility.

 

 

-----Original Message-----
From: Gopal Vijayaraghavan [mailto:go...@hortonworks.com] On Behalf Of Gopal
Vijayaraghavan
Sent: 08 January 2016 09:34
To: user@hive.apache.org
Subject: Re: Impact of partitioning on certain queries

 

 

> Ok we hope that partitioning improves performance where the predicate 

>is on partitioned columns

 

Nope.

 

Partitioning *only* improves performance if your queries run with

 

set hive.mapred.mode=strict;

 

That's the "use strict" easy way to make sure you're writing good queries.

 

Even then, schema design in hive is something you need to learn with the
assumption that neither the storage layer, nor the compute layer is part of
"hive".

 

It floats itself in an "access" layer above both. Not sure there's any
legacy tech to draw parallels with that.

 

If you haven't seen this before, here's an example of the problem

 

 
<http://www.slideshare.net/Hadoop_Summit/hive-at-yahoo-letters-from-the-tren
>
http://www.slideshare.net/Hadoop_Summit/hive-at-yahoo-letters-from-the-tren

ches/24

 

 

Cheers,

Gopal

RE: Impact of partitioning on certain queries

Reply via email to