Thanks for the quick response and confirmation, Marcelo, I just opened
https://issues.apache.org/jira/browse/SPARK-7725.

On Mon, May 18, 2015 at 9:02 PM, Marcelo Vanzin <[email protected]> wrote:

> Hi Shay,
>
> Yeah, that seems to be a bug; it doesn't seem to be related to the default
> FS nor compareFs either - I can reproduce this with HDFS when copying files
> from the local fs too. In yarn-client mode things seem to work.
>
> Could you file a bug to track this? If you don't have a jira account I can
> do that for you.
>
>
> On Mon, May 18, 2015 at 9:38 AM, Shay Rojansky <[email protected]> wrote:
>
>> I'm having issues with submitting a Spark Yarn job in cluster mode when
>> the cluster filesystem is file:///. It seems that additional resources
>> (--py-files) are simply being skipped and not being added into the
>> PYTHONPATH. The same issue may also exist for --jars, --files, etc.
>>
>> We use a simple NFS mount on all our nodes instead of HDFS. The problem
>> is that when I submit a job that has files (via --py-files), these don't
>> get copied across to the application's staging directory, nor do they get
>> added to the PYTHONPATH. On startup, I can clearly see the message "Source
>> and destination file systems are the same. Not copying", which is a result
>> of the check here:
>> https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L221
>>
>> The compareFs function simply looks whether the scheme, host and port are
>> the same, and if so (my case), simply skips the copy. While that in itself
>> isn't a problem, the PYTHONPATH isn't updated either.
>>
>
>
>
> --
> Marcelo
>

Reply via email to