Hi Lucas, I tried something like this but got different results.
I wrote code that opened a file on HDFS, wrote a line and called sync. Without closing the file, I ran a wordcount with that file as input. It did work fine and was able to count the words that were sync'ed (even though the file length seems to come as 0 like you noted in fs -ls) So, not sure what's happening in your case. In the MR job, do the job counters indicate no bytes were read ? On a different note though, if you can describe a little more what you are trying to accomplish, we could probably work a better solution. Thanks hemanth On Sat, Feb 23, 2013 at 7:15 PM, Lucas Bernardi <luc...@gmail.com> wrote: > Helo Hemanth, thanks for answering. > The file is open by a separate process not map reduce related at all. You > can think of it as a servlet, receiving requests, and writing them to this > file, every time a request is received it is written and > org.apache.hadoop.fs.FSDataOutputStream.sync() is invoked. > > At the same time, I want to run a map reduce job over this file. Simply > runing the word count example doesn't seem to work, it is like if the file > were empty. > > hadoop -fs -tail works just fine, and reading the file using > org.apache.hadoop.fs.FSDataInputStream also works ok. > > Last thing, the web interface doesn't see the contents, and command hadoop > -fs -ls says the file is empty. > > What am I doing wrong? > > Thanks! > > Lucas > > > > On Sat, Feb 23, 2013 at 4:37 AM, Hemanth Yamijala < > yhema...@thoughtworks.com> wrote: > >> Could you please clarify, are you opening the file in your mapper code >> and reading from there ? >> >> Thanks >> Hemanth >> >> On Friday, February 22, 2013, Lucas Bernardi wrote: >> >>> Hello there, I'm trying to use hadoop map reduce to process an open >>> file. The writing process, writes a line to the file and syncs the file >>> to readers. >>> (org.apache.hadoop.fs.FSDataOutputStream.sync()). >>> >>> If I try to read the file from another process, it works fine, at least >>> using >>> org.apache.hadoop.fs.FSDataInputStream. >>> >>> hadoop -fs -tail also works just fine >>> >>> But it looks like map reduce doesn't read any data. I tried using the >>> word count example, same thing, it is like if the file were empty for the >>> map reduce framework. >>> >>> I'm using hadoop 1.0.3. and pig 0.10.0 >>> >>> I need some help around this. >>> >>> Thanks! >>> >>> Lucas >>> >> >