Hello Spark Users, I am working on Historical Data Processing for Telecom provider where I processed single Job and output wrote to parquet in append mode, while reading i am able to read the parquet view table because it's just lazy evolution.
But when i apply action on top of that by joining my other two sources MongoDB collection, JSON from S3 Bucket and Parquet source. It throws error as below, which means there are single gz parquet file failed to access. where actually by checking using hadoop fs command, file is there. after getting this error and validating i deleted specific that file, i though it will be okey if those data which were there in that specific file that will not get transformed but other will be , but afterwords also spark transformation got failed. I must have to re-run Data Ingestion Spark Job ! Which was time wasting stuff. So what is exact solution of this error ! 2016-10-21 06:54:03,2918 ERROR Client fs/client/fileclient/cc/client.cc:1802 Thread: 20779 Open failed for file /user/ubuntu/UserMaster/part-r-00113-41456b8f-4c6b-46e6-b70f-ca, LookupFid error No such file or directory(2) 2016-10-21 06:54:03,2918 ERROR JniCommon fs/client/fileclient/cc/jni_MapRClient.cc:2488 Thread: 20779 getBlockInfo failed, Could not open file /user/ubuntu/UserMaster/part-r-00113-41456b8f-4c6b-46e6-b70f-ca org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree: TungstenExchange hashpartitioning(trim(userId#24),1024), None +- Scan ParquetRelation[userId#24,screenSize#27,platformVersion#21,productName#35,isPixelHit#19L,browser#23,operator#33,circle#26,platform#30,browserVersion#29,brandName#36] InputPaths: maprfs:/user/ubuntu/UserMaster at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:49) at org.apache.spark.sql.execution.Exchange.doExecute(Exchange.scala:247) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) at org.apache.spark.sql.execution.Sort.doExecute(Sort.scala:64) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) at org.apache.spark.sql.execution.joins.SortMergeOuterJoin.doExecute(SortMergeOuterJoin.scala:107) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) at org.apache.spark.sql.execution.Project.doExecute(basicOperators.scala:46) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) -- Yours Aye, Chetan Khatri.