Hi Xiaobo,

I would recommend putting the files into an HDFS cluster on the same
machines instead if possible.  If you're concerned about duplicating the
data, you can set the replication factor to 1 so you don't use more space
than before.

In my experience of Spark around 0.7.0 or so, when reading from a local
file with sc.textFile("file:///...") you had to have that file in that
exact path on every Spark worker machine.

Cheers,
Andrew


On Tue, Dec 31, 2013 at 5:34 AM, guxiaobo1982 <guxiaobo1...@qq.com> wrote:

> Hi,
>
> We are going to deploy a standalone mode cluster, we know Spark can read
> local data files into RDDs, but the question is where should we put the
> data file, on the server where commit our application, or the server where
> the master service runs?
>
> Regards,
>
> Xiaobo Gu
>

Reply via email to