spark on yarn is trying to use file:// instead of hdfs://

2014-06-20 Thread Koert Kuipers
i noticed that when i submit a job to yarn it mistakenly tries to upload files to local filesystem instead of hdfs. what could cause this? in spark-env.sh i have HADOOP_CONF_DIR set correctly (and spark-submit does find yarn), and my core-site.xml has a fs.defaultFS that is hdfs, not local filesys

Re: spark on yarn is trying to use file:// instead of hdfs://

2014-06-20 Thread Marcelo Vanzin
Hi Koert, Could you provide more details? Job arguments, log messages, errors, etc. On Fri, Jun 20, 2014 at 9:40 AM, Koert Kuipers wrote: > i noticed that when i submit a job to yarn it mistakenly tries to upload > files to local filesystem instead of hdfs. what could cause this? > > in spark-en

Re: spark on yarn is trying to use file:// instead of hdfs://

2014-06-20 Thread Koert Kuipers
yeah sure see below. i strongly suspect its something i misconfigured causing yarn to try to use local filesystem mistakenly. * [koert@cdh5-yarn ~]$ /usr/local/lib/spark/bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3 --execu

Re: spark on yarn is trying to use file:// instead of hdfs://

2014-06-20 Thread bc Wong
Koert, is there any chance that your fs.defaultFS isn't setup right? On Fri, Jun 20, 2014 at 9:57 AM, Koert Kuipers wrote: > yeah sure see below. i strongly suspect its something i misconfigured > causing yarn to try to use local filesystem mistakenly. > > * > > [koert@cdh5

Re: spark on yarn is trying to use file:// instead of hdfs://

2014-06-20 Thread Koert Kuipers
in /etc/hadoop/conf/core-site.xml: fs.defaultFS hdfs://cdh5-yarn.tresata.com:8020 also hdfs seems the default: [koert@cdh5-yarn ~]$ hadoop fs -ls / Found 5 items drwxr-xr-x - hdfs supergroup 0 2014-06-19 12:31 /data drwxrwxrwt - hdfs supergroup 0 2014-06-20 12:

Re: spark on yarn is trying to use file:// instead of hdfs://

2014-06-20 Thread Koert Kuipers
i put some logging statements in yarn.Client and that confirms its using local filesystem: 14/06/20 15:20:33 INFO Client: fs.defaultFS is file:/// so somehow fs.defaultFS is not being picked up from /etc/hadoop/conf/core-site.xml, but spark does correctly pick up yarn.resourcemanager.hostname from

Re: spark on yarn is trying to use file:// instead of hdfs://

2014-06-20 Thread Koert Kuipers
ok solved it. as it happened in spark/conf i also had a file called core.site.xml (with some tachyone related stuff in it) so thats why it ignored /etc/hadoop/conf/core-site.xml On Fri, Jun 20, 2014 at 3:24 PM, Koert Kuipers wrote: > i put some logging statements in yarn.Client and that confi