Enis,
I was trying to understand how MultiFileInputFormat works but I could not.
My use case is:
* several small (a few megs) SequenceFiles as input files.
I need to make sure I don't end up with a Map task per input file.
Ideally I would like to get sets of input files of size X (the size of
I'm not really sure if it helps but there is a MultiFileSplit and
MultiFileInputFormat which is optimized for cases where numFiles >
numMapTasks. Let me know if you have any further questions.
Alejandro Abdelnur wrote:
The input for a M/R job consists of multiple files that are less than a
blo
The input for a M/R job consists of multiple files that are less than a
block size and the number of maps is the number of files.
I would like to be able to control the number of maps in a way that I have
one map task for multiple files (for example, gluing together files up to a
block size).
I d