Re: [Spark-Submit] Where to store data files while running job in cluster mode?

Arun Rai Fri, 29 Sep 2017 05:05:05 -0700

Or you can try mounting that drive to all node.

On Fri, Sep 29, 2017 at 6:14 AM Jörn Franke <jornfra...@gmail.com> wrote:


> You should use a distributed filesystem such as HDFS. If you want to use
> the local filesystem then you have to copy each file to each node.
>
> > On 29. Sep 2017, at 12:05, Gaurav1809 <gauravhpan...@gmail.com> wrote:
> >
> > Hi All,
> >
> > I have multi node architecture of (1 master,2 workers) Spark cluster, the
> > job runs to read CSV file data and it works fine when run on local mode
> > (Local(*)).
> > However, when the same job is ran in cluster mode(Spark://HOST:PORT), it
> is
> > not able to read it.
> > I want to know how to reference the files Or where to store them?
> Currently
> > the CSV data file is on master(from where the job is submitted).
> >
> > Following code works fine in local mode but not in cluster mode.
> >
> > val spark = SparkSession
> >      .builder()
> >      .appName("SampleFlightsApp")
> >      .master("spark://masterIP:7077") // change it to .master("local[*])
> > for local mode
> >      .getOrCreate()
> >
> >    val flightDF =
> > spark.read.option("header",true).csv("/home/username/sampleflightdata")
> >    flightDF.printSchema()
> >
> > Error: FileNotFoundException: File file:/home/username/sampleflightdata
> does
> > not exist
> >
> >
> >
> > --
> > Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
> >
> > ---------------------------------------------------------------------
> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Re: [Spark-Submit] Where to store data files while running job in cluster mode?

Reply via email to