Re: [Spark-Submit] Where to store data files while running job in cluster mode?

lucas.g...@gmail.com Fri, 29 Sep 2017 08:33:13 -0700

We use S3, there are caveats and issues with that but it can be made to
work.


If interested let me know and I'll show you our workarounds.  I wouldn't do
it naively though, there's lots of potential problems.  If you already have
HDFS use that, otherwise all things told it's probably less effort to use
S3.

Gary

On 29 September 2017 at 05:03, Arun Rai <arunkumar...@gmail.com> wrote:

> Or you can try mounting that drive to all node.
>
> On Fri, Sep 29, 2017 at 6:14 AM Jörn Franke <jornfra...@gmail.com> wrote:
>
>> You should use a distributed filesystem such as HDFS. If you want to use
>> the local filesystem then you have to copy each file to each node.
>>
>> > On 29. Sep 2017, at 12:05, Gaurav1809 <gauravhpan...@gmail.com> wrote:
>> >
>> > Hi All,
>> >
>> > I have multi node architecture of (1 master,2 workers) Spark cluster,
>> the
>> > job runs to read CSV file data and it works fine when run on local mode
>> > (Local(*)).
>> > However, when the same job is ran in cluster mode(Spark://HOST:PORT),
>> it is
>> > not able to read it.
>> > I want to know how to reference the files Or where to store them?
>> Currently
>> > the CSV data file is on master(from where the job is submitted).
>> >
>> > Following code works fine in local mode but not in cluster mode.
>> >
>> > val spark = SparkSession
>> >      .builder()
>> >      .appName("SampleFlightsApp")
>> >      .master("spark://masterIP:7077") // change it to
>> .master("local[*])
>> > for local mode
>> >      .getOrCreate()
>> >
>> >    val flightDF =
>> > spark.read.option("header",true).csv("/home/username/sampleflightdata")
>> >    flightDF.printSchema()
>> >
>> > Error: FileNotFoundException: File file:/home/username/sampleflightdata
>> does
>> > not exist
>> >
>> >
>> >
>> > --
>> > Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>

Re: [Spark-Submit] Where to store data files while running job in cluster mode?

Reply via email to