Hi Shay, Yeah, that seems to be a bug; it doesn't seem to be related to the default FS nor compareFs either - I can reproduce this with HDFS when copying files from the local fs too. In yarn-client mode things seem to work.
Could you file a bug to track this? If you don't have a jira account I can do that for you. On Mon, May 18, 2015 at 9:38 AM, Shay Rojansky <r...@roji.org> wrote: > I'm having issues with submitting a Spark Yarn job in cluster mode when > the cluster filesystem is file:///. It seems that additional resources > (--py-files) are simply being skipped and not being added into the > PYTHONPATH. The same issue may also exist for --jars, --files, etc. > > We use a simple NFS mount on all our nodes instead of HDFS. The problem is > that when I submit a job that has files (via --py-files), these don't get > copied across to the application's staging directory, nor do they get added > to the PYTHONPATH. On startup, I can clearly see the message "Source and > destination file systems are the same. Not copying", which is a result of > the check here: > https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L221 > > The compareFs function simply looks whether the scheme, host and port are > the same, and if so (my case), simply skips the copy. While that in itself > isn't a problem, the PYTHONPATH isn't updated either. > -- Marcelo