Hello,
I am currently testing Hive 4.2 with Iceberg on the TPC-DS benchmark.
Loading all the tables of TPC-DS in Iceberg format is okay, and I can run
some of the TPC-DS queries successfully. This is an example of loading a
partitioned table
(where 'partitioned by' can be replaced with 'PARTITIONED BY SPEC ...').
create table catalog_returns
(cr_returned_time_sk bigint, ... , cr_returned_date_sk bigint)
partitioned by (cr_returned_date_sk bigint)
STORED BY ICEBERG
stored as orc tblproperties ("orc.compress"="SNAPPY");
insert overwrite table catalog_returns select * from
tpcds_bin_partitioned_orc_10000.catalog_returns;
My problem is that 'analyze table ... compute statistics for columns'
fails, and thus Hive cannot exploit additional statistics computed with
'analyze table'. I wonder if anyone tried a similar scenario: 1) create
partitioned Iceberg tables and load data; 2) run 'analyze table ...'; 3)
observe improved performance thanks to the additional statistics computed
with 'analyze table'.
Thanks,
--- Sungwoo