About Reading Parquet - failed to read single gz parquet - failed entire transformation

Chetan Khatri Fri, 21 Oct 2016 12:07:05 -0700

Hello Spark Users,

I am working on Historical Data Processing for Telecom provider where I
processed single Job and output wrote to parquet in append mode, while
reading i am able to read the parquet view table because it's just lazy
evolution.


But when i apply action on top of that by joining my other two sources
MongoDB collection, JSON from S3 Bucket and Parquet source. It throws error
as below, which means there are single gz parquet file failed to access.
where actually by checking using hadoop fs command, file is there. after
getting this error and validating i deleted specific that file, i though it
will be okey if those data which were there in that specific file that will
not get transformed but other will be , but afterwords also spark
transformation got failed.

I must have to re-run Data Ingestion Spark Job ! Which was time wasting
stuff.

So what is exact solution of this error !


2016-10-21 06:54:03,2918 ERROR Client
fs/client/fileclient/cc/client.cc:1802 Thread: 20779 Open failed for file
/user/ubuntu/UserMaster/part-r-00113-41456b8f-4c6b-46e6-b70f-ca, LookupFid
error No such file or directory(2)
2016-10-21 06:54:03,2918 ERROR JniCommon
fs/client/fileclient/cc/jni_MapRClient.cc:2488 Thread: 20779 getBlockInfo
failed, Could not open file
/user/ubuntu/UserMaster/part-r-00113-41456b8f-4c6b-46e6-b70f-ca
org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute,
tree:
TungstenExchange hashpartitioning(trim(userId#24),1024), None
+- Scan
ParquetRelation[userId#24,screenSize#27,platformVersion#21,productName#35,isPixelHit#19L,browser#23,operator#33,circle#26,platform#30,browserVersion#29,brandName#36]
InputPaths: maprfs:/user/ubuntu/UserMaster

  at
org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:49)
  at org.apache.spark.sql.execution.Exchange.doExecute(Exchange.scala:247)
  at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
  at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
  at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
  at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
  at org.apache.spark.sql.execution.Sort.doExecute(Sort.scala:64)
  at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
  at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
  at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
  at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
  at
org.apache.spark.sql.execution.joins.SortMergeOuterJoin.doExecute(SortMergeOuterJoin.scala:107)
  at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
  at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
  at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
  at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
  at
org.apache.spark.sql.execution.Project.doExecute(basicOperators.scala:46)
  at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
  at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
  at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
  at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)


-- 
Yours Aye,
Chetan Khatri.

About Reading Parquet - failed to read single gz parquet - failed entire transformation

Reply via email to