I don't quite get what you mean - we don't have such a flaw. The first
split task makes sure to read one extra record, even if its last byte
is a newline. The subsequent splits (that is, those with offsets not
0), always ignore the first record even if it is complete in their
given range.
You may
Harsh,
Thanks for the response.
>From http://wiki.apache.org/hadoop/HadoopMapReduce
>For example TextInputFormat will read the last line of the FileSplit past
the split boundary and when reading other than the first FileSplit,
TextInputFormat ignores the content up to the first newline.
When th
Hi Praveen,
This is explained at http://wiki.apache.org/hadoop/HadoopMapReduce
[Map section].
On Thu, Jan 24, 2013 at 10:20 PM, Praveen Sripati
wrote:
> Hi,
>
> HDFS splits the file across record boundaries. So, how does the mapper
> processing the second block (b2) determine that the first reco
Hello Praveen,
Do you mean the InputFormat splits the file across record boundaries??I
actually didn't get your question. What do you mean by 'record' with
respect to HDFS. Did you mean HDFS block?
Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com
On Thu, Jan 24, 2013 a