Ok, so I found a workaround for this issue, I share it here for others:
So the key problem is that hadoop won't update the file size until the file
is closed, then the FileInputFormat will see never-closed-files as empty
files and generate no splits for the map reduce process.
To fix this problem
I didn't notice, thanks for the heads up.
On Mon, Feb 25, 2013 at 4:31 AM, Harsh J ha...@cloudera.com wrote:
Just an aside (I've not tried to look at the original issue yet), but
Scribe has not been maintained (nor has seen a release) in over a year
now -- looking at the commit history. Same
It looks like getSplits in FileInputFormat is ignoring 0 lenght files
That also would explain the weird behavior of tail, which seems to always
jump to the start since file length is 0.
So, basically, sync doesn't update file length, any code based on file
size, is unreliable.
Am I right?
I am using the same version of Hadoop as you.
Can you look at something like Scribe, which AFAIK fits the use case you
describe.
Thanks
Hemanth
On Sun, Feb 24, 2013 at 3:33 AM, Lucas Bernardi luc...@gmail.com wrote:
That is exactly what I did, but in my case, it is like if the file were
Yeah I looked at scribe, looks good but sounds like too much for my
problem. I'd rather make it work the simple way. Could you pleas post your
code, may be I'm doing something wrong on the sync side. Maybe a buffer
size, block size or some other parameter is different...
Thanks!
Lucas
On Sun,
Just an aside (I've not tried to look at the original issue yet), but
Scribe has not been maintained (nor has seen a release) in over a year
now -- looking at the commit history. Same case with both Facebook and
Twitter's fork.
On Mon, Feb 25, 2013 at 7:16 AM, Lucas Bernardi luc...@gmail.com
Helo Hemanth, thanks for answering.
The file is open by a separate process not map reduce related at all. You
can think of it as a servlet, receiving requests, and writing them to this
file, every time a request is received it is written and
org.apache.hadoop.fs.FSDataOutputStream.sync() is
Hi Lucas,
I tried something like this but got different results.
I wrote code that opened a file on HDFS, wrote a line and called sync.
Without closing the file, I ran a wordcount with that file as input. It did
work fine and was able to count the words that were sync'ed (even though
the file
That is exactly what I did, but in my case, it is like if the file were
empty, the job counters say no bytes read.
I'm using hadoop 1.0.3 which version did you try?
What I'm trying to do is just some basic analyitics on a product search
system. There is a search service, every time a user
Could you please clarify, are you opening the file in your mapper code and
reading from there ?
Thanks
Hemanth
On Friday, February 22, 2013, Lucas Bernardi wrote:
Hello there, I'm trying to use hadoop map reduce to process an open file. The
writing process, writes a line to the file and syncs
Hello there, I'm trying to use hadoop map reduce to process an open file. The
writing process, writes a line to the file and syncs the file to readers.
(org.apache.hadoop.fs.FSDataOutputStream.sync()).
If I try to read the file from another process, it works fine, at least
using
11 matches
Mail list logo