zhimin wu created KYLIN-5828:
--------------------------------

             Summary: During multi-jobs concurrent building, the flat table may 
use inconsistent global dictionaries, resulting in incorrect count distinct 
query results.
                 Key: KYLIN-5828
                 URL: https://issues.apache.org/jira/browse/KYLIN-5828
             Project: Kylin
          Issue Type: Bug
          Components: Storage - Parquet
            Reporter: zhimin wu
            Assignee: zhimin wu


*Root Cause*

When multiple tasks are concurrently building and using the same global 
dictionary, the consistency of the dictionary version used in the flat table 
encoding process is not guaranteed. At the same time, another task expands the 
dictionary, causing some flat table partitions to mistakenly use the new 
version of the dictionary partition file. Due to the inconsistent data 
distribution, the correct dictionary content cannot be obtained, resulting in a 
flat table encoding column of 0 and ultimately causing an abnormal count 
distinct value.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to