Rui Li created HIVE-17114: ----------------------------- Summary: HoS: Possible skew in shuffling when data is not really skewed Key: HIVE-17114 URL: https://issues.apache.org/jira/browse/HIVE-17114 Project: Hive Issue Type: Bug Reporter: Rui Li Assignee: Rui Li Priority: Minor
Observed in HoS and may apply to other engines as well. When we join 2 tables on a single int key, we use the key itself as hash code in {{ObjectInspectorUtils.hashCode}}: {code} case INT: return ((IntObjectInspector) poi).get(o); {code} Suppose the keys are different but are all some multiples of 10. And if we choose 10 as #reducers, the shuffle will be skewed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)