[ https://issues.apache.org/jira/browse/HUDI-7267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated HUDI-7267: --------------------------------- Labels: pull-request-available (was: ) > csi will cause data loss during sql query > ----------------------------------------- > > Key: HUDI-7267 > URL: https://issues.apache.org/jira/browse/HUDI-7267 > Project: Apache Hudi > Issue Type: Bug > Components: index > Reporter: KnightChess > Priority: Major > Labels: pull-request-available > Attachments: image-2023-12-28-13-29-15-943.png > > > from the picture, csi will use parquet chunk block meta calculate min/max > value, and save it to mdt col stat. For complex cols, such as **info > array<struct<name: string, age: int>>** , parquet meta will contain only > `info.array.name`, `infor.array.age`, but hudi will only calculate `info` > column, so this meta in mdt will be null. > And if sql expression contain `IsNotNull(info)`, the file will all be skip. > And consider common cols, which will be add in the future and old file will > not contain this col, may cause some other question. So, make code logical > clean, Check for null before evaluating the value:min/mav/nullValue. > !image-2023-12-28-13-29-15-943.png|width=1458,height=798! -- This message was sent by Atlassian Jira (v8.20.10#820010)