qingyuan18 opened a new issue #1854: URL: https://github.com/apache/hudi/issues/1854
version using JDK: Jdk 1.8.0_242 Scala: 2.11.12 Spark: 2.4.0 Hudi Spark bundle: 0.5.2-incubating Steps to reproduce the behavior: 1. create managed hive table 2. using Spark datasource to upset record into it def upsert(albumDf: DataFrame, tableName: String, key: String, combineKey: String, tablePath:String):Unit = { albumDf.write .format("hudi") .option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY, DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL) .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, key) .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, combineKey) .option(HoodieWriteConfig.TABLE_NAME, tableName) .option(DataSourceWriteOptions.OPERATION_OPT_KEY, DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL) .option("hoodie.upsert.shuffle.parallelism", "2") .mode(SaveMode.Append) .save(tablePath) } 3. using spark sql to read the result val spark: SparkSession = SparkSession.builder() .appName("hudi-test") .master("yarn") .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer") .config("spark.sql.hive.convertMetastoreParquet", "false") // Uses Hive SerDe, this is mandatory for MoR tables .getOrCreate() spark.sql("select * from xxxx.xxxx_acidtest2 ").show() submit command: spark-submit --master yarn --conf spark.sql.hive.convertMetastoreParquet=false HudiTechSpike-jar-with-dependencies.jar errors: java.io.IOException: Not a file: hdfs://nameservice1/data/operations/racoe/epi/hive/raw/xxxx_acidtest2/default at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:329) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:205) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) at scala.Option.getOrElse(Option.scala:121) seems like it does not recognize the hudi data format/path structure * Running on Docker? : No **Additional context**: using spark-shell is the same error spark-shell --master yarn --conf spark.sql.hive.convertMetastoreParquet=false --jars hudi-spark-bundle_2.11-0.5.3.jar ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org