Re: map reduce and sync

Hemanth Yamijala Sat, 23 Feb 2013 06:54:45 -0800

Hi Lucas,

I tried something like this but got different results.


I wrote code that opened a file on HDFS, wrote a line and called sync.
Without closing the file, I ran a wordcount with that file as input. It did
work fine and was able to count the words that were sync'ed (even though
the file length seems to come as 0 like you noted in fs -ls)

So, not sure what's happening in your case. In the MR job, do the job
counters indicate no bytes were read ?

On a different note though, if you can describe a little more what you are
trying to accomplish, we could probably work a better solution.

Thanks
hemanth


On Sat, Feb 23, 2013 at 7:15 PM, Lucas Bernardi <luc...@gmail.com> wrote:

> Helo Hemanth, thanks for answering.
> The file is open by a separate process not map reduce related at all. You
> can think of it as a servlet, receiving requests, and writing them to this
> file, every time a request is received it is written and
> org.apache.hadoop.fs.FSDataOutputStream.sync() is invoked.
>
> At the same time, I want to run a map reduce job over this file. Simply
> runing the word count example doesn't seem to work, it is like if the file
> were empty.
>
> hadoop -fs -tail works just fine, and reading the file using
> org.apache.hadoop.fs.FSDataInputStream also works ok.
>
> Last thing, the web interface doesn't see the contents, and command hadoop
> -fs -ls says the file is empty.
>
> What am I doing wrong?
>
> Thanks!
>
> Lucas
>
>
>
> On Sat, Feb 23, 2013 at 4:37 AM, Hemanth Yamijala <
> yhema...@thoughtworks.com> wrote:
>
>> Could you please clarify, are you opening the file in your mapper code
>> and reading from there ?
>>
>> Thanks
>> Hemanth
>>
>> On Friday, February 22, 2013, Lucas Bernardi wrote:
>>
>>> Hello there, I'm trying to use hadoop map reduce to process an open
>>> file. The writing process, writes a line to the file and syncs the file
>>> to readers.
>>> (org.apache.hadoop.fs.FSDataOutputStream.sync()).
>>>
>>> If I try to read the file from another process, it works fine, at least
>>> using
>>> org.apache.hadoop.fs.FSDataInputStream.
>>>
>>> hadoop -fs -tail also works just fine
>>>
>>> But it looks like map reduce doesn't read any data. I tried using the
>>> word count example, same thing, it is like if the file were empty for the
>>> map reduce framework.
>>>
>>> I'm using hadoop 1.0.3. and pig 0.10.0
>>>
>>> I need some help around this.
>>>
>>> Thanks!
>>>
>>> Lucas
>>>
>>
>

Re: map reduce and sync

Reply via email to