satish created HUDI-1839:
----------------------------

             Summary: FSUtils getAllPartitions broken by 
NotSerializableException: org.apache.hadoop.fs.Path
                 Key: HUDI-1839
                 URL: https://issues.apache.org/jira/browse/HUDI-1839
             Project: Apache Hudi
          Issue Type: Bug
            Reporter: satish


FSUtils getAllPartitionPaths is expected to work if metadata table is enabled 
or not. There are multiple callers using it with that assumption 
(clustering/cleaner). 

See stack trace below

21/04/20 17:28:44 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster 
with FAILED (diag message: User class threw exception: 
org.apache.hudi.exception.HoodieException: Error fetching partition paths from 
metadata table
        at 
org.apache.hudi.common.fs.FSUtils.getAllPartitionPaths(FSUtils.java:321)
        at 
org.apache.hudi.table.action.cluster.strategy.PartitionAwareClusteringPlanStrategy.generateClusteringPlan(PartitionAwareClusteringPlanStrategy.java:67)
        at 
org.apache.hudi.table.action.cluster.SparkClusteringPlanActionExecutor.createClusteringPlan(SparkClusteringPlanActionExecutor.java:71)
        at 
org.apache.hudi.table.action.cluster.BaseClusteringPlanActionExecutor.execute(BaseClusteringPlanActionExecutor.java:56)
        at 
org.apache.hudi.table.HoodieSparkCopyOnWriteTable.scheduleClustering(HoodieSparkCopyOnWriteTable.java:160)
        at 
org.apache.hudi.client.AbstractHoodieWriteClient.scheduleClusteringAtInstant(AbstractHoodieWriteClient.java:873)
        at 
org.apache.hudi.client.AbstractHoodieWriteClient.scheduleClustering(AbstractHoodieWriteClient.java:861)
        at 
com.uber.data.efficiency.hudi.HudiRewriter.rewriteDataUsingHudi(HudiRewriter.java:111)
        at com.uber.data.efficiency.hudi.HudiRewriter.main(HudiRewriter.java:50)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:690)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
Failed to serialize task 53, not attempting to retry it. Exception during 
serialization: java.io.NotSerializableException: org.apache.hadoop.fs.Path
Serialization stack:
        - object not serializable (class: org.apache.hadoop.fs.Path, value: 
hdfs://ns-router-prod-phx/app/supermodel/dwhdev.db/kafka_hp_demand_job_expired_nodedup)
        - element of array (index: 0)
        - array (class [Ljava.lang.Object;, size 1)
        - field (class: scala.collection.mutable.WrappedArray$ofRef, name: 
array, type: class [Ljava.lang.Object;)
        - object (class scala.collection.mutable.WrappedArray$ofRef, 
WrappedArray(hdfs://ns-router-prod-phx/app/supermodel/dwhdev.db/kafka_hp_demand_job_expired_nodedup))
        - writeObject data (class: 
org.apache.spark.rdd.ParallelCollectionPartition)
        - object (class org.apache.spark.rdd.ParallelCollectionPartition, 
org.apache.spark.rdd.ParallelCollectionPartition@735)
        - field (class: org.apache.spark.scheduler.ResultTask, name: partition, 
type: interface org.apache.spark.Partition)
        - object (class org.apache.spark.scheduler.ResultTask, ResultTask(1, 0))
        at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1904)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1892)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1891)
        at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
        at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1891)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:935)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:935)
        at scala.Option.foreach(Option.scala:257)
        at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:935)
        at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2125)
        at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2074)
        at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2063)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
        at 
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:746)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2070)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2091)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2110)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2135)
        at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:968)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
        at org.apache.spark.rdd.RDD.collect(RDD.scala:967)
        at 
org.apache.spark.api.java.JavaRDDLike$class.collect(JavaRDDLike.scala:361)
        at 
org.apache.spark.api.java.AbstractJavaRDDLike.collect(JavaRDDLike.scala:45)
        at 
org.apache.hudi.client.common.HoodieSparkEngineContext.map(HoodieSparkEngineContext.java:79)
        at 
org.apache.hudi.metadata.FileSystemBackedTableMetadata.getAllPartitionPaths(FileSystemBackedTableMetadata.java:79)
        at 
org.apache.hudi.common.fs.FSUtils.getAllPartitionPaths(FSUtils.java:319)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to