[ https://issues.apache.org/jira/browse/HIVE-24575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17259659#comment-17259659 ]
Mustafa İman commented on HIVE-24575: ------------------------------------- Merged to master. Thanks [~dengzh] > VectorGroupByOperator reusing keys can lead to wrong results > ------------------------------------------------------------ > > Key: HIVE-24575 > URL: https://issues.apache.org/jira/browse/HIVE-24575 > Project: Hive > Issue Type: Bug > Components: Vectorization > Reporter: Zhihua Deng > Assignee: Zhihua Deng > Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > A common sql like > {code:java} > select category as category, count(distinct maskdid) as uv from > dwd_internal_inc_d group by category{code} > can have a wrong result on the trunk, the result of column category can be > confused and > aggregate of distinct maskdid is also wrong. > After some debugging, We find that the problem is caused by wrong > byteStarts[i] when using it to copy the current keys to the reusable keys: > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/wrapper/VectorHashKeyWrapperGeneral.java#L351-L362] > The byteStarts[i] is always 0 due to Arrays.fill(byteStarts, 0); so it copies > the range from 0 other then the real start index to len of the current keys > to the reusable keys when clone.byteValues[i].length >= byteValues[i].length > met, which results to the problem. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)