Hi Harsh Thank you for your reply. Do you mean I need to change the FileSplit to avoid those errors I mentioned happen?
Regards! Chen On Wed, Aug 29, 2012 at 2:46 AM, Harsh J <ha...@cloudera.com> wrote: > Hi Chen, > > Does your record reader and mapper handle the case where one map split > may not exactly get the whole record? Your case is not very different > from the newlines logic presented here: > http://wiki.apache.org/hadoop/HadoopMapReduce > > On Wed, Aug 29, 2012 at 11:13 AM, Chen He <airb...@gmail.com> wrote: > > Hi guys > > > > I met a interesting problem when I implement my own custom InputFormat > which > > extends the FileInputFormat.(I rewrite the RecordReader class but not the > > InputSplit class) > > > > My recordreader will take following format as a basic record: (my > > recordreader extends the LineRecordReader. It returns a record if it > meets > > #Trailer# and contains #Header#. I only have one input file that is > composed > > of many of following basic record) > > > > #Header# > > .....(many lines, may be 0 lines or 1000 lines, it varies) > > #Trailer# > > > > Everything works fine if above basic input unit in a file is integer > times > > of mapper. For example, I use 2 mappers and there are two basic records > in > > my input file. Or I use 3 mappers and there are 6 basic units in the > input > > file. > > > > However, if I use 4 mappers but there are 3 basic units in the input > > file(not integer times). The final output is incorrect. The "Map Input > > Bytes" in the job counter is also less than the input file size. How can > I > > fix it? Do I need to rewrite the inputSplit? > > > > Any reply will be appreciated! > > > > Regards! > > > > Chen > > > > -- > Harsh J >