[ 
https://issues.apache.org/jira/browse/HIVE-27356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko resolved HIVE-27356.
-----------------------------------
    Fix Version/s: 4.1.0
       Resolution: Fixed

> Hive should write name of blob type instead of table name in Puffin
> -------------------------------------------------------------------
>
>                 Key: HIVE-27356
>                 URL: https://issues.apache.org/jira/browse/HIVE-27356
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Zoltán Borók-Nagy
>            Assignee: Simhadri Govindappa
>            Priority: Major
>             Fix For: 4.1.0
>
>
> Currently Hive writes the name of the table plus snapshot id as blob type:
> [https://github.com/apache/hive/blob/aa1e067033ef0b5468f725cfd3776810800af96d/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java#L422]
> Instead, it should write the name of the blog it writes. Table name and 
> snapshot id are redundant information anyway, as they can be inferred from 
> the location and filename of the puffin file.
> Currently it writes a non-standard blob (Standard blob types are listed 
> [here|https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/puffin/StandardBlobTypes.java]).
>  I think it would be better to write standard blobs for interoperability. But 
> if Hive wants to write non-standard blobs anyway, it should still come up 
> with a descriptive name for them, e.g. 'hive-column-statistics-v1'.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to