nized. Would
> not do it for TB of data ;) ...
>
> -Original Message-
> From: Jörn Franke [mailto:jornfra...@gmail.com]
> Sent: Friday, September 29, 2017 5:14 AM
> To: Gaurav1809 <gauravhpan...@gmail.com>
> Cc: user@spark.apache.org
> Subject: Re: [Spark-Submit] W
<gauravhpan...@gmail.com>
Cc: user@spark.apache.org
Subject: Re: [Spark-Submit] Where to store data files while running job in
cluster mode?
You should use a distributed filesystem such as HDFS. If you want to use the
local filesystem then you have to copy each file to each node.
> On 29.
Try tachyon.. its less fuss
On Fri, 29 Sep 2017 at 8:32 PM lucas.g...@gmail.com
wrote:
> We use S3, there are caveats and issues with that but it can be made to
> work.
>
> If interested let me know and I'll show you our workarounds. I wouldn't
> do it naively though,
We use S3, there are caveats and issues with that but it can be made to
work.
If interested let me know and I'll show you our workarounds. I wouldn't do
it naively though, there's lots of potential problems. If you already have
HDFS use that, otherwise all things told it's probably less effort
Yes you need to store the file at a location where it is equally
retrievable ("same path") for the master and all nodes in the cluster. A
simple solution (apart from a HDFS) that does not scale to well but might
be a OK with only 3 nodes like in your configuration is a network
accessible storage
Or you can try mounting that drive to all node.
On Fri, Sep 29, 2017 at 6:14 AM Jörn Franke wrote:
> You should use a distributed filesystem such as HDFS. If you want to use
> the local filesystem then you have to copy each file to each node.
>
> > On 29. Sep 2017, at
You should use a distributed filesystem such as HDFS. If you want to use the
local filesystem then you have to copy each file to each node.
> On 29. Sep 2017, at 12:05, Gaurav1809 wrote:
>
> Hi All,
>
> I have multi node architecture of (1 master,2 workers) Spark
Place it in HDFS and give the reference path in your code.
Thanks,
Sathish
On Fri, Sep 29, 2017 at 3:31 PM, Gaurav1809 wrote:
> Hi All,
>
> I have multi node architecture of (1 master,2 workers) Spark cluster, the
> job runs to read CSV file data and it works fine when