Re: [Spark-Submit] Where to store data files while running job in cluster mode?

Alexander Czech Fri, 29 Sep 2017 06:45:07 -0700

Yes you need to store the file at a location where it is equally
retrievable ("same path") for the master and all nodes in the cluster. A
simple solution (apart from a HDFS) that does not scale to well but might
be a OK with only 3 nodes like in your configuration is a network
accessible storage (a NAS or a shared folder for example).


hope this helps
Alexander

On Fri, Sep 29, 2017 at 12:05 PM, Sathishkumar Manimoorthy <
mrsathishkuma...@gmail.com> wrote:

> Place it in HDFS and give the reference path in your code.
>
> Thanks,
> Sathish
>
> On Fri, Sep 29, 2017 at 3:31 PM, Gaurav1809 <gauravhpan...@gmail.com>
> wrote:
>
>> Hi All,
>>
>> I have multi node architecture of (1 master,2 workers) Spark cluster, the
>> job runs to read CSV file data and it works fine when run on local mode
>> (Local(*)). However, when the same job is ran in cluster mode
>> (Spark://HOST:PORT), it is not able to read it. I want to know how to
>> reference the files Or where to store them? Currently the CSV data file is
>> on master(from where the job is submitted).
>>
>> Following code works fine in local mode but not in cluster mode.
>>
>> val spark = SparkSession
>>       .builder()
>>       .appName("SampleFlightsApp")
>>       .master("spark://masterIP:7077") // change it to .master("local[*])
>> for local mode
>>       .getOrCreate()
>>
>>     val flightDF =
>> spark.read.option("header",true).csv("/home/username/sampleflightdata")
>>     flightDF.printSchema()
>>
>> Error: FileNotFoundException: File file:/home/gaurav/sampleflightdata
>> does
>> not exist
>>
>>
>>
>> --
>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>
>

Re: [Spark-Submit] Where to store data files while running job in cluster mode?

Reply via email to