Re: Partition Columns

Mungeol Heo Thu, 14 May 2015 16:46:38 -0700

Hi, Appan.

you can just simply check the amount of data your query reads from the
table. or the number of the mapper for running that query.
then, you can know whether it filtering or scanning all table.
Of course, it is a lazy approach. but, you can give a try.
I think query 1 should work fine. because I am using a lot of that
kind of queries and it works fine for me.


Thanks,
mungeol

On Fri, May 15, 2015 at 8:31 AM, Appan Thirumaligai
<[email protected]> wrote:
> I agree with you Viral. I see the same behavior as well. We are on Hive 0.13
> for the cluster where I'm testing this.
>
> On Thu, May 14, 2015 at 2:16 PM, Viral Bajaria <[email protected]>
> wrote:
>>
>> Hi Appan,
>>
>> In my experience I have seen that Query 2 does not use partition pruning
>> because it's not a straight up filtering and involves using functions (aka
>> UDFs).
>>
>> What version of Hive are you using ?
>>
>> Thanks,
>> Viral
>>
>>
>>
>> On Thu, May 14, 2015 at 1:48 PM, Appan Thirumaligai
>> <[email protected]> wrote:
>>>
>>> Hi,
>>>
>>> I have a question on Hive Optimizer. I have a table with partition
>>> columns  eg.,Sales partitioned by year, month, day. Assume that I have two
>>> years worth of data on this table. I'm running two queries on this table.
>>>
>>> Query 1: Select * from Sales where year=2015 and month = 5 and day
>>> between 1 and 7
>>>
>>> Query 2: Select * from Sales where concat_ws('-',cast(year as
>>> string),lpad(cast(month as string),2,'0'),lpad(cast(day as string),2,'0'))
>>> between '2015-01-01' and '2015-01-07'
>>>
>>> When I ran Explain command on the above two queries I get a Filter
>>> operation for the 2nd Query and there is no Filter Operation for the first
>>> query.
>>>
>>> My question is: Do both queries use the partitions or is it used only in
>>> Query 1 and for Query 2 it will be a scan of all the data?
>>>
>>> Thanks for your help.
>>>
>>> Thanks,
>>> Appan
>>
>>
>

Re: Partition Columns

Reply via email to