Hi Xiaobo, I would recommend putting the files into an HDFS cluster on the same machines instead if possible. If you're concerned about duplicating the data, you can set the replication factor to 1 so you don't use more space than before.
In my experience of Spark around 0.7.0 or so, when reading from a local file with sc.textFile("file:///...") you had to have that file in that exact path on every Spark worker machine. Cheers, Andrew On Tue, Dec 31, 2013 at 5:34 AM, guxiaobo1982 <guxiaobo1...@qq.com> wrote: > Hi, > > We are going to deploy a standalone mode cluster, we know Spark can read > local data files into RDDs, but the question is where should we put the > data file, on the server where commit our application, or the server where > the master service runs? > > Regards, > > Xiaobo Gu >