why does it need to be local file? why not do some filter ops on hdfs file and save to hdfs, from where you can create rdd?
you can read a small file in on driver program and use sc.parallelize to turn it into RDD On May 16, 2014 7:01 PM, "Sai Prasanna" <ansaiprasa...@gmail.com> wrote: > I found that if a file is present in all the nodes in the given path in > localFS, then reading is possible. > > But is there a way to read if the file is present only in certain nodes ?? > [There should be a way !!] > > *NEED: Wanted to do some filter ops in HDFS file, create a local file of > the result, create an RDD out of it operate * > > Is there any way out ?? > > Thanks in advance ! > > > > > On Fri, May 9, 2014 at 12:18 AM, Sai Prasanna <ansaiprasa...@gmail.com>wrote: > >> Hi Everyone, >> >> I think all are pretty busy, the response time in this group has slightly >> increased. >> >> But anyways, this is a pretty silly problem, but could not get over. >> >> I have a file in my localFS, but when i try to create an RDD out of it, >> tasks fails with file not found exception is thrown at the log files. >> >> *var file = sc.textFile("file:///home/sparkcluster/spark/input.txt");* >> *file.top(1);* >> >> input.txt exists in the above folder but still Spark coudnt find it. Some >> parameters need to be set ?? >> >> Any help is really appreciated. Thanks !! >> > >