sassai opened a new issue #2145: URL: https://github.com/apache/hudi/issues/2145
**Describe the problem you faced** Running a query in Hive on Hudi data using LIMIT clause results in IOException. ```console java.io.IOException: Input path does not exist: abfs://x...@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/year=2020/month=10/day=5/.hoodie_partition_metadata ``` The `.hoodie_partition_metadata` files does exist and can be listed with `hdfs dfs -ls ` using the path above. Example query used: select * from nyc_taxi.address limit 100; Running the same query without the limit clause works fine. HIVE_AUX_JAR variable holds `hudi-utilities-bundle_2.11-0.6.0.jar` and `hudi-hadoop-mr-bundle-0.6.0.jar` **To Reproduce** Steps to reproduce the behavior: 1. Create a COPY_ON_WRITE table 2. Insert records to table (table has 11 million records) 3. set hive.fetch.task.conversion=none; 4. Query the table using the statement above 5. IOException is thrown **Expected behavior** Resultset containing 100 records is returned. **Environment Description** * Hudi version : 0.6.0 * Spark version : 2.4.0 * Hive version : 3.1 * Hadoop version : 3 * Storage (HDFS/S3/GCS..) : ADLS Gen2 * Running on Docker? (yes/no) : no **Stacktrace** ```console Error while compiling statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1601881880788_0031_6_00, diagnostics=[Vertex vertex_1601881880788_0031_6_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: address initializer failed, vertex=vertex_1601881880788_0031_6_00 [Map 1], org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: abfs://x...@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/year=2020/month=10/day=5/.hoodie_partition_metadata at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:300) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:240) at org.apache.hudi.hadoop.HoodieParquetInputFormat.listStatus(HoodieParquetInputFormat.java:105) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:328) at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:541) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:830) at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:249) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:280) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:271) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:271) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:255) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: Input path does not exist: abfs://x...@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/year=2020/month=10/day=5/.hoodie_partition_metadata at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:274) ... 19 more ]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0 ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org