Re: [Spark-Submit] Where to store data files while running job in cluster mode?

2017-09-29 Thread vaquar khan
nized. Would > not do it for TB of data ;) ... > > -Original Message- > From: Jörn Franke [mailto:jornfra...@gmail.com] > Sent: Friday, September 29, 2017 5:14 AM > To: Gaurav1809 <gauravhpan...@gmail.com> > Cc: user@spark.apache.org > Subject: Re: [Spark-Submit] W

RE: [Spark-Submit] Where to store data files while running job in cluster mode?

2017-09-29 Thread JG Perrin
<gauravhpan...@gmail.com> Cc: user@spark.apache.org Subject: Re: [Spark-Submit] Where to store data files while running job in cluster mode? You should use a distributed filesystem such as HDFS. If you want to use the local filesystem then you have to copy each file to each node. > On 29.

Re: [Spark-Submit] Where to store data files while running job in cluster mode?

2017-09-29 Thread Imran Rajjad
Try tachyon.. its less fuss On Fri, 29 Sep 2017 at 8:32 PM lucas.g...@gmail.com wrote: > We use S3, there are caveats and issues with that but it can be made to > work. > > If interested let me know and I'll show you our workarounds. I wouldn't > do it naively though,

Re: [Spark-Submit] Where to store data files while running job in cluster mode?

2017-09-29 Thread lucas.g...@gmail.com
We use S3, there are caveats and issues with that but it can be made to work. If interested let me know and I'll show you our workarounds. I wouldn't do it naively though, there's lots of potential problems. If you already have HDFS use that, otherwise all things told it's probably less effort

Re: [Spark-Submit] Where to store data files while running job in cluster mode?

2017-09-29 Thread Alexander Czech
Yes you need to store the file at a location where it is equally retrievable ("same path") for the master and all nodes in the cluster. A simple solution (apart from a HDFS) that does not scale to well but might be a OK with only 3 nodes like in your configuration is a network accessible storage

Re: [Spark-Submit] Where to store data files while running job in cluster mode?

2017-09-29 Thread Arun Rai
Or you can try mounting that drive to all node. On Fri, Sep 29, 2017 at 6:14 AM Jörn Franke wrote: > You should use a distributed filesystem such as HDFS. If you want to use > the local filesystem then you have to copy each file to each node. > > > On 29. Sep 2017, at

Re: [Spark-Submit] Where to store data files while running job in cluster mode?

2017-09-29 Thread Jörn Franke
You should use a distributed filesystem such as HDFS. If you want to use the local filesystem then you have to copy each file to each node. > On 29. Sep 2017, at 12:05, Gaurav1809 wrote: > > Hi All, > > I have multi node architecture of (1 master,2 workers) Spark

Re: [Spark-Submit] Where to store data files while running job in cluster mode?

2017-09-29 Thread Sathishkumar Manimoorthy
Place it in HDFS and give the reference path in your code. Thanks, Sathish On Fri, Sep 29, 2017 at 3:31 PM, Gaurav1809 wrote: > Hi All, > > I have multi node architecture of (1 master,2 workers) Spark cluster, the > job runs to read CSV file data and it works fine when