Hi Robin

The result of 'bdate=to_date(unix_timestamp())' is evaluated during the runtime 
of the query. But the data that a query should process is 
determined initially before executing the map reduce jobs. That is the reason 
the query is running over whole data set.

When you provide 'bdate='2012-09-01'' the hive parser knows initially itself 
what data which all partitions should be taken into account. So this query runs 
on only the required partitions and not on whole data.

To add on , it is not the buckets considered here on where clause but the 
partitions.  
 
Regards,
Bejoy KS


________________________________
 From: Robin Verlangen <ro...@us2.nl>
To: user@hive.apache.org 
Sent: Thursday, September 20, 2012 5:06 PM
Subject: Hive ignoring buckets when using dynamic where
 

Hi there,

We're working on some queries that use buckets to improve performance with like 
1000x. However we ran into a problem. When we use a fixed hardcoded date it 
works fine:

SELECT * FROM standard_feed WHERE bdate='2012-09-01'
Starts a job with 6 mappers, 2 reducers

When we use it dynamically:
SELECT * FROM standard_feed WHERE bdate=to_date(unix_timestamp())
Starts a job with 1000 mappers, 2 reducers

What's the problem here? The result of the to_date of the current timestamp 
should be equal to a normal fixed date? Does anyone have a solution?

Best regards, 


Robin Verlangen
Software engineer

W http://www.robinverlangen.nl
E ro...@us2.nl

Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

Reply via email to