Re: Hive ignoring buckets when using dynamic where

Bejoy KS Thu, 20 Sep 2012 07:28:19 -0700

Hi Robin

AFAIK, one of the possible solutions is is to take the computation of current 
date out of hive, may be to a shell script (you can have the computation 
current - x days aslo from within a shell script i guess ). Once you have 
identified the required date, store it in a variable and substitute the same in 
your hive query.


Some links for your reference
http://hive.apache.org/docs/r0.9.0/language_manual/var_substitution.html 
http://kickstarthadoop.blogspot.in/2011/10/include-values-during-execution-time-in.html


 

Regards,
Bejoy KS


________________________________
 From: Robin Verlangen <[email protected]>
To: [email protected]; Bejoy KS <[email protected]> 
Sent: Thursday, September 20, 2012 7:30 PM
Subject: Re: Hive ignoring buckets when using dynamic where
 

Hi Bejoy,

Thank you for your reply. Is there any way to fix my problem? I want to have a 
query that has a dynamic range, from now (and in some cases now - x days until 
now).

Best regards, 


Robin Verlangen
Software engineer

W http://www.robinverlangen.nl
E [email protected]

Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.



2012/9/20 Bejoy KS <[email protected]>

Hi Robin
>
>
>The result of 'bdate=to_date(unix_timestamp())' is evaluated during the 
>runtime of the query. But the data that a query should process is 
>determined initially before executing the map reduce jobs. That is the reason 
>the query is running over whole data set.
>
>
>When you provide 'bdate='2012-09-01'' the hive parser knows initially itself 
>what data which all partitions should be taken into account. So this query 
>runs on only the required partitions and not on whole data.
>
>
>To add on , it is not the buckets considered here on where clause but the 
>partitions.  
> 
>Regards,
>Bejoy KS
>
>
>
>________________________________
> From: Robin Verlangen <[email protected]>
>To: [email protected] 
>Sent: Thursday, September 20, 2012 5:06 PM
>Subject: Hive ignoring buckets when using dynamic where
> 
>
>
>Hi there,
>
>
>We're working on some queries that use buckets to improve performance with 
>like 1000x. However we ran into a problem. When we use a fixed hardcoded date 
>it works fine:
>
>
>SELECT * FROM standard_feed WHERE bdate='2012-09-01'
>Starts a job with 6 mappers, 2 reducers
>
>
>When we use it dynamically:
>SELECT * FROM standard_feed WHERE bdate=to_date(unix_timestamp())
>Starts a job with 1000 mappers, 2 reducers
>
>
>What's the problem here? The result of the to_date of the current timestamp 
>should be equal to a normal fixed date? Does anyone have a solution?
>
>Best regards, 
>
>
>
>Robin Verlangen
>Software engineer
>
>
>W http://www.robinverlangen.nl
>E [email protected]
>
>
>Disclaimer: The information contained in this message and attachments is 
>intended solely for the attention and use of the named addressee and may be 
>confidential. If you are not the intended recipient, you are reminded that the 
>information remains the property of the sender. You must not use, disclose, 
>distribute, copy, print or rely on this e-mail. If you have received this 
>message in error, please contact the sender immediately and irrevocably delete 
>this message and any copies.
>
>
>

Re: Hive ignoring buckets when using dynamic where

Reply via email to