Hi lisoda,
Thank you for trying the Hive4-beta and reporting this issue. Based
on the current information you provided, i can not reproduce this issue. Could
you please give more clues? e.g.
1) Which Tez version are you using? Hive4-beta uses Tez 0.10.2 by
default.
2) Can we reproduce this issue with small data or just insert
several rows? Does the iceberg data have delete files?
3) Does this problem only happen with parquet data? What about orc?
4) If you turn off the vectorized execution set
hive.vectorized.execution.enabled=false; will the query succeed?
BTW, it is better to create a ticket in
https://issues.apache.org/jira/projects/HIVE/issues, and describe your problem
as well as a reproducible steps.
Thanks,
Butao Zhang
---- Replied Message ----
| From | lisoda<[email protected]> |
| Date | 11/22/2023 10:48 |
| To | [email protected]<[email protected]> |
| Subject | hive can not read iceberg-parquet table |
Hi team.
I am currently testing HIVE-4.0.0-BETA.
For better read performance, we use the Iceberg-Parquet table.
However, we have found that HIVE is currently unable to handle iceberg-parquet
tables correctly.
Example:
CREATE EXTERNAL TABLE iceberg_dwd.b_qqd_shop_rfm_parquet_snappy
STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
LOCATION
'hdfs://xxxxxxx/iceberg-catalog/warehouse/test/b_qqd_shop_rfm_parquet_snappy/'
TBLPROPERTIES
('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
set hive.default.fileformat=orc;
set hive.default.fileformat.managed=orc;
create table test_parquet_as_orc as select * from b_qqd_shop_rfm_parquet_snappy
limit 100;
, TaskAttempt 2 failed, info=[Error: Node: xxxx/xxx.xxxx.xx.xx: Error while
running task ( failure ) :
attempt_1696729618575_69586_1_00_000000_2:java.lang.RuntimeException:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException:
Hive Runtime Error while processing row
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
at
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
at
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
at
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
at
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
at
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
at
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:76)
at
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
processing row
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110)
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83)
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293)
... 16 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error
while processing row
at
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:993)
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
... 19 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.NullPointerException
at
org.apache.hadoop.hive.ql.exec.vector.reducesink.VectorReduceSinkEmptyKeyOperator.process(VectorReduceSinkEmptyKeyOperator.java:137)
at
org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919)
at
org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158)
at
org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919)
at
org.apache.hadoop.hive.ql.exec.vector.VectorLimitOperator.process(VectorLimitOperator.java:108)
at
org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919)
at
org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:171)
at
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.deliverVectorizedRowBatch(VectorMapOperator.java:809)
at
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:878)
... 20 more
Caused by: java.lang.NullPointerException
at
org.apache.hadoop.hive.common.io.NonSyncByteArrayOutputStream.write(NonSyncByteArrayOutputStream.java:110)
at
org.apache.hadoop.hive.serde2.lazybinary.fast.LazyBinarySerializeWrite.writeString(LazyBinarySerializeWrite.java:280)
at
org.apache.hadoop.hive.ql.exec.vector.VectorSerializeRow$VectorSerializeStringWriter.serialize(VectorSerializeRow.java:532)
at
org.apache.hadoop.hive.ql.exec.vector.VectorSerializeRow.serializeWrite(VectorSerializeRow.java:316)
at
org.apache.hadoop.hive.ql.exec.vector.VectorSerializeRow.serializeWrite(VectorSerializeRow.java:297)
at
org.apache.hadoop.hive.ql.exec.vector.reducesink.VectorReduceSinkEmptyKeyOperator.process(VectorReduceSinkEmptyKeyOperator.java:113)
... 28 more