Re: [Spark-Submit] Where to store data files while running job in cluster mode?

Sathishkumar Manimoorthy Fri, 29 Sep 2017 03:05:54 -0700

Place it in HDFS and give the reference path in your code.

Thanks,
Sathish


On Fri, Sep 29, 2017 at 3:31 PM, Gaurav1809 <gauravhpan...@gmail.com> wrote:

> Hi All,
>
> I have multi node architecture of (1 master,2 workers) Spark cluster, the
> job runs to read CSV file data and it works fine when run on local mode
> (Local(*)). However, when the same job is ran in cluster mode
> (Spark://HOST:PORT), it is not able to read it. I want to know how to
> reference the files Or where to store them? Currently the CSV data file is
> on master(from where the job is submitted).
>
> Following code works fine in local mode but not in cluster mode.
>
> val spark = SparkSession
>       .builder()
>       .appName("SampleFlightsApp")
>       .master("spark://masterIP:7077") // change it to .master("local[*])
> for local mode
>       .getOrCreate()
>
>     val flightDF =
> spark.read.option("header",true).csv("/home/username/sampleflightdata")
>     flightDF.printSchema()
>
> Error: FileNotFoundException: File file:/home/gaurav/sampleflightdata does
> not exist
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Re: [Spark-Submit] Where to store data files while running job in cluster mode?

Reply via email to