this should get you on the right path:
https://issues.apache.org/jira/browse/HIVE-7121


From: Connell Donaghy [mailto:cdona...@pinterest.com]
Sent: Monday, July 13, 2015 2:50 PM
To: user@hive.apache.org
Subject: DISTRIBUTE BY question

Hey! I'm trying to write a tool which uses a storagehandler to store HFiles, 
using a specific partition function. So in order to do this, I have been trying 
to use DISTRIBUTE BY and a UDF using the key column and number of reducers 
(which becomes number of partitions, as each reducer creates its own hfile.) 
However, I have noticed that sometimes two UDF values (say 0 and 11) will both 
go to reducer 0, while reducer 11 does not get any inputs. Could you guys point 
me to the place in your source code where you implement the partitioning for 
the map/reduce job and DISTRIBUTE BY, so that I could try and reverse-engineer 
it to ensure the keys go to the right partition? If my question doesn't make 
sense, just pointing me to where DISTRIBUTE BY is implemented would be very 
helpful, and thank you so so much for your time!



======================================================================
THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS CONFIDENTIAL 
and may contain information that is privileged and exempt from disclosure under 
applicable law. If you are neither the intended recipient nor responsible for 
delivering the message to the intended recipient, please note that any 
dissemination, distribution, copying or the taking of any action in reliance 
upon the message is strictly prohibited. If you have received this 
communication in error, please notify the sender immediately.  Thank you.

Reply via email to