why does it need to be local file? why not do some filter ops on hdfs file
and save to hdfs, from where you can create rdd?

you can read a small file in on driver program and use sc.parallelize to
turn it into RDD
On May 16, 2014 7:01 PM, "Sai Prasanna" <ansaiprasa...@gmail.com> wrote:

> I found that if a file is present in all the nodes in the given path in
> localFS, then reading is possible.
>
> But is there a way to read if the file is present only in certain nodes ??
> [There should be a way !!]
>
> *NEED: Wanted to do some filter ops in HDFS file, create a local file of
> the result, create an RDD out of it operate *
>
> Is there any way out ??
>
> Thanks in advance !
>
>
>
>
> On Fri, May 9, 2014 at 12:18 AM, Sai Prasanna <ansaiprasa...@gmail.com>wrote:
>
>> Hi Everyone,
>>
>> I think all are pretty busy, the response time in this group has slightly
>> increased.
>>
>> But anyways, this is a pretty silly problem, but could not get over.
>>
>> I have a file in my localFS, but when i try to create an RDD out of it,
>> tasks fails with file not found exception is thrown at the log files.
>>
>> *var file = sc.textFile("file:///home/sparkcluster/spark/input.txt");*
>> *file.top(1);*
>>
>> input.txt exists in the above folder but still Spark coudnt find it. Some
>> parameters need to be set ??
>>
>> Any help is really appreciated. Thanks !!
>>
>
>

Reply via email to