Re: How does mapper process partial records?

2013-01-25 Thread Harsh J
I don't quite get what you mean - we don't have such a flaw. The first split task makes sure to read one extra record, even if its last byte is a newline. The subsequent splits (that is, those with offsets not 0), always ignore the first record even if it is complete in their given range. You may

Re: How does mapper process partial records?

2013-01-24 Thread Praveen Sripati
Harsh, Thanks for the response. >From http://wiki.apache.org/hadoop/HadoopMapReduce >For example TextInputFormat will read the last line of the FileSplit past the split boundary and when reading other than the first FileSplit, TextInputFormat ignores the content up to the first newline. When th

Re: How does mapper process partial records?

2013-01-24 Thread Harsh J
Hi Praveen, This is explained at http://wiki.apache.org/hadoop/HadoopMapReduce [Map section]. On Thu, Jan 24, 2013 at 10:20 PM, Praveen Sripati wrote: > Hi, > > HDFS splits the file across record boundaries. So, how does the mapper > processing the second block (b2) determine that the first reco

Re: How does mapper process partial records?

2013-01-24 Thread Mohammad Tariq
Hello Praveen, Do you mean the InputFormat splits the file across record boundaries??I actually didn't get your question. What do you mean by 'record' with respect to HDFS. Did you mean HDFS block? Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Thu, Jan 24, 2013 a