Re: CombineInputFormat for mix of small and large files.

Ravindra Fri, 24 Feb 2017 07:59:05 -0800

Also to add that my test input file has record less than what I see as
going to mappers (i.e. Map Input Records). and the input file is more than
double the size of block size.



On Fri, Feb 24, 2017 at 4:25 PM Ravindra <ravindra.baj...@gmail.com> wrote:

> Hi All,
>
> I have implemented CombineInputFormat for my job and it works well for
> small files i.e. combine those to the block boundary. But there are few
> very large file that it gets from the input source along with small files.
> Hence the mapper that got to work on this large file becomes a laggard.
>
> I had overwritten isSplitable to return false. I guess that was the reason
> and hence I removed this overriding (i.e. allow hadoop to have default
> behaviour on this). Hadoop splits the big files now, fine but then I see
> inconsistency with the output records.
>
> Is there anything related with my CustomRecordReader that I need to take
> care of. Not sure.
>
> Please advise!
>
> Thanks.
>

Re: CombineInputFormat for mix of small and large files.

Reply via email to