Also to add that my test input file has record less than what I see as going to mappers (i.e. Map Input Records). and the input file is more than double the size of block size.
On Fri, Feb 24, 2017 at 4:25 PM Ravindra <ravindra.baj...@gmail.com> wrote: > Hi All, > > I have implemented CombineInputFormat for my job and it works well for > small files i.e. combine those to the block boundary. But there are few > very large file that it gets from the input source along with small files. > Hence the mapper that got to work on this large file becomes a laggard. > > I had overwritten isSplitable to return false. I guess that was the reason > and hence I removed this overriding (i.e. allow hadoop to have default > behaviour on this). Hadoop splits the big files now, fine but then I see > inconsistency with the output records. > > Is there anything related with my CustomRecordReader that I need to take > care of. Not sure. > > Please advise! > > Thanks. >