[ 
https://issues.apache.org/jira/browse/HUDI-7267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7267:
---------------------------------
    Labels: pull-request-available  (was: )

> csi will cause data loss during sql query
> -----------------------------------------
>
>                 Key: HUDI-7267
>                 URL: https://issues.apache.org/jira/browse/HUDI-7267
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: index
>            Reporter: KnightChess
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: image-2023-12-28-13-29-15-943.png
>
>
> from the picture, csi will use parquet chunk block meta calculate min/max 
> value, and save it to mdt col stat. For complex cols, such as **info 
> array<struct<name: string, age: int>>** , parquet meta will contain only 
> `info.array.name`, `infor.array.age`, but hudi will only calculate `info` 
> column, so this meta in mdt will be null.
> And if sql expression contain `IsNotNull(info)`, the file will all be skip.
> And consider common cols, which will be add in the future and old file will 
> not contain this col, may cause some other question. So, make code logical 
> clean, Check for null before evaluating the value:min/mav/nullValue.
> !image-2023-12-28-13-29-15-943.png|width=1458,height=798!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to