[ https://issues.apache.org/jira/browse/HIVE-24471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
mahesh kumar behera reassigned HIVE-24471: ------------------------------------------ > Add support for combiner in hash mode group aggregation > -------------------------------------------------------- > > Key: HIVE-24471 > URL: https://issues.apache.org/jira/browse/HIVE-24471 > Project: Hive > Issue Type: Bug > Components: Hive > Reporter: mahesh kumar behera > Assignee: mahesh kumar behera > Priority: Major > > In map side group aggregation, partial grouped aggregation is calculated to > reduce the data written to disk by map task. In case of hash aggregation, > where the input data is not sorted, hash table is used. If the hash table > size increases beyond configurable limit, data is flushed to disk and new > hash table is generated. If the reduction by hash table is less than min hash > aggregation reduction calculated during compile time, the map side > aggregation is converted to streaming mode. So if the first few batch of > records does not result into significant reduction, then the mode is switched > to streaming mode. This may have impact on performance, if the subsequent > batch of records have less number of distinct values. To mitigate this > situation, a combiner can be added to the map task after the keys are sorted. > This will make sure that the aggregation is done if possible and reduce the > data written to disk. -- This message was sent by Atlassian Jira (v8.3.4#803005)