Hi lisoda,

            Thank you for trying the Hive4-beta and reporting this issue. Based 
on the current information you provided, i can not reproduce this issue. Could 
you please give more clues? e.g.
            1) Which Tez version are you using? Hive4-beta uses Tez 0.10.2 by 
default.
            2) Can we reproduce this issue with small data or just insert 
several rows? Does the iceberg data have delete files?
            3) Does this problem only happen with parquet data? What about orc?
            4) If you turn off the vectorized execution set 
hive.vectorized.execution.enabled=false; will the query succeed?


           BTW, it is better to create a ticket in 
https://issues.apache.org/jira/projects/HIVE/issues, and describe your problem 
as well as a reproducible steps.





Thanks,

Butao Zhang

---- Replied Message ----
| From | lisoda<lis...@yeah.net> |
| Date | 11/22/2023 10:48 |
| To | user@hive.apache.org<user@hive.apache.org> |
| Subject | hive can not read iceberg-parquet table |
Hi team.

I am currently testing HIVE-4.0.0-BETA.
For better read performance, we use the Iceberg-Parquet table.
However, we have found that HIVE is currently unable to handle iceberg-parquet 
tables correctly.


Example:


CREATE EXTERNAL TABLE iceberg_dwd.b_qqd_shop_rfm_parquet_snappy
STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
LOCATION 
'hdfs://xxxxxxx/iceberg-catalog/warehouse/test/b_qqd_shop_rfm_parquet_snappy/'
TBLPROPERTIES 
('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');


set hive.default.fileformat=orc;
set hive.default.fileformat.managed=orc;
create table test_parquet_as_orc as select * from b_qqd_shop_rfm_parquet_snappy 
limit 100;






, TaskAttempt 2 failed, info=[Error: Node: xxxx/xxx.xxxx.xx.xx: Error while 
running task ( failure ) : 
attempt_1696729618575_69586_1_00_000000_2:java.lang.RuntimeException: 
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error while processing row 
        at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
        at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
        at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
        at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
        at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
        at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
        at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
        at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
        at 
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
        at 
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:76)
        at 
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row 
        at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110)
        at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83)
        at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414)
        at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293)
        ... 16 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row 
        at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:993)
        at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
        ... 19 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.NullPointerException
        at 
org.apache.hadoop.hive.ql.exec.vector.reducesink.VectorReduceSinkEmptyKeyOperator.process(VectorReduceSinkEmptyKeyOperator.java:137)
        at 
org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919)
        at 
org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158)
        at 
org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919)
        at 
org.apache.hadoop.hive.ql.exec.vector.VectorLimitOperator.process(VectorLimitOperator.java:108)
        at 
org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919)
        at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:171)
        at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.deliverVectorizedRowBatch(VectorMapOperator.java:809)
        at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:878)
        ... 20 more
Caused by: java.lang.NullPointerException
        at 
org.apache.hadoop.hive.common.io.NonSyncByteArrayOutputStream.write(NonSyncByteArrayOutputStream.java:110)
        at 
org.apache.hadoop.hive.serde2.lazybinary.fast.LazyBinarySerializeWrite.writeString(LazyBinarySerializeWrite.java:280)
        at 
org.apache.hadoop.hive.ql.exec.vector.VectorSerializeRow$VectorSerializeStringWriter.serialize(VectorSerializeRow.java:532)
        at 
org.apache.hadoop.hive.ql.exec.vector.VectorSerializeRow.serializeWrite(VectorSerializeRow.java:316)
        at 
org.apache.hadoop.hive.ql.exec.vector.VectorSerializeRow.serializeWrite(VectorSerializeRow.java:297)
        at 
org.apache.hadoop.hive.ql.exec.vector.reducesink.VectorReduceSinkEmptyKeyOperator.process(VectorReduceSinkEmptyKeyOperator.java:113)
        ... 28 more

Reply via email to