Re: map reduce and sync

2013-03-04 Thread Lucas Bernardi
Ok, so I found a workaround for this issue, I share it here for others: So the key problem is that hadoop won't update the file size until the file is closed, then the FileInputFormat will see never-closed-files as empty files and generate no splits for the map reduce process. To fix this problem

Re: map reduce and sync

2013-02-25 Thread Lucas Bernardi
I didn't notice, thanks for the heads up. On Mon, Feb 25, 2013 at 4:31 AM, Harsh J ha...@cloudera.com wrote: Just an aside (I've not tried to look at the original issue yet), but Scribe has not been maintained (nor has seen a release) in over a year now -- looking at the commit history. Same

Re: map reduce and sync

2013-02-25 Thread Lucas Bernardi
It looks like getSplits in FileInputFormat is ignoring 0 lenght files That also would explain the weird behavior of tail, which seems to always jump to the start since file length is 0. So, basically, sync doesn't update file length, any code based on file size, is unreliable. Am I right?

Re: map reduce and sync

2013-02-24 Thread Hemanth Yamijala
I am using the same version of Hadoop as you. Can you look at something like Scribe, which AFAIK fits the use case you describe. Thanks Hemanth On Sun, Feb 24, 2013 at 3:33 AM, Lucas Bernardi luc...@gmail.com wrote: That is exactly what I did, but in my case, it is like if the file were

Re: map reduce and sync

2013-02-24 Thread Lucas Bernardi
Yeah I looked at scribe, looks good but sounds like too much for my problem. I'd rather make it work the simple way. Could you pleas post your code, may be I'm doing something wrong on the sync side. Maybe a buffer size, block size or some other parameter is different... Thanks! Lucas On Sun,

Re: map reduce and sync

2013-02-24 Thread Harsh J
Just an aside (I've not tried to look at the original issue yet), but Scribe has not been maintained (nor has seen a release) in over a year now -- looking at the commit history. Same case with both Facebook and Twitter's fork. On Mon, Feb 25, 2013 at 7:16 AM, Lucas Bernardi luc...@gmail.com

Re: map reduce and sync

2013-02-23 Thread Lucas Bernardi
Helo Hemanth, thanks for answering. The file is open by a separate process not map reduce related at all. You can think of it as a servlet, receiving requests, and writing them to this file, every time a request is received it is written and org.apache.hadoop.fs.FSDataOutputStream.sync() is

Re: map reduce and sync

2013-02-23 Thread Hemanth Yamijala
Hi Lucas, I tried something like this but got different results. I wrote code that opened a file on HDFS, wrote a line and called sync. Without closing the file, I ran a wordcount with that file as input. It did work fine and was able to count the words that were sync'ed (even though the file

Re: map reduce and sync

2013-02-23 Thread Lucas Bernardi
That is exactly what I did, but in my case, it is like if the file were empty, the job counters say no bytes read. I'm using hadoop 1.0.3 which version did you try? What I'm trying to do is just some basic analyitics on a product search system. There is a search service, every time a user

Re: map reduce and sync

2013-02-22 Thread Hemanth Yamijala
Could you please clarify, are you opening the file in your mapper code and reading from there ? Thanks Hemanth On Friday, February 22, 2013, Lucas Bernardi wrote: Hello there, I'm trying to use hadoop map reduce to process an open file. The writing process, writes a line to the file and syncs

map reduce and sync

2013-02-21 Thread Lucas Bernardi
Hello there, I'm trying to use hadoop map reduce to process an open file. The writing process, writes a line to the file and syncs the file to readers. (org.apache.hadoop.fs.FSDataOutputStream.sync()). If I try to read the file from another process, it works fine, at least using