Gera Shegalov created SPARK-2577: ------------------------------------ Summary: File upload to viewfs is broken due to mount point resolution Key: SPARK-2577 URL: https://issues.apache.org/jira/browse/SPARK-2577 Project: Spark Issue Type: Bug Components: YARN Reporter: Gera Shegalov Priority: Blocker
YARN client resolves paths of uploaded artifacts. When a viewfs path is resolved, the filesystem changes to the target file system. However, the original fs is passed to {{ClientDistributedCacheManager#addResource}}. {code} 14/07/18 01:30:31 INFO yarn.Client: Uploading file:/Users/gshegalov/workspace/spark-tw/assembly/target/scala-2.10/spark-assembly-1.1.0-SNAPSHOT-hadoop3.0.0-SNAPSHOT.jar to viewfs:/user/gshegalov/.sparkStaging/application_1405479201490_0049/spark-assembly-1.1.0-SNAPSHOT-hadoop3.0.0-SNAPSHOT.jar Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: hdfs://ns1:8020/user/gshegalov/.sparkStaging/application_1405479201490_0049/spark-assembly-1.1.0-SNAPSHOT-hadoop3.0.0-SNAPSHOT.jar, expected: viewfs:/ at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:643) at org.apache.hadoop.fs.viewfs.ViewFileSystem.getUriPath(ViewFileSystem.java:116) at org.apache.hadoop.fs.viewfs.ViewFileSystem.getFileStatus(ViewFileSystem.java:345) at org.apache.spark.deploy.yarn.ClientDistributedCacheManager.addResource(ClientDistributedCacheManager.scala:72) at org.apache.spark.deploy.yarn.ClientBase$$anonfun$prepareLocalResources$5.apply(ClientBase.scala:236) at org.apache.spark.deploy.yarn.ClientBase$$anonfun$prepareLocalResources$5.apply(ClientBase.scala:229) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.deploy.yarn.ClientBase$class.prepareLocalResources(ClientBase.scala:229) at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:37) at org.apache.spark.deploy.yarn.Client.runApp(Client.scala:74) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:81) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:136) at org.apache.spark.SparkContext.<init>(SparkContext.scala:320) at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:28) at org.apache.spark.examples.SparkPi.main(SparkPi.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:303) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) {code} There are two options: # do not resolve path because symlinks are currently disabled in Hadoop # pass the correct filesystem object -- This message was sent by Atlassian JIRA (v6.2#6252)