In this case, the code in question is the non-distributed code rather than Hadoop. But yes I agree it will make a perhaps bigger difference on Hadoop. All of the Hadoop stuff uses integer keys.
On Fri, Mar 9, 2012 at 2:10 AM, Paritosh Ranjan <pran...@xebia.com> wrote: > Are these identifiers used as keys for mappers somewhere? > If yes, then the sorting phase of map reduce will be much faster with long, > as the key comparison time will be less ( long comparison will take less > time than String comparison, due to lesser number of bytes ) as well as > more records can be kept in memory while sorting ( because the size is less > ). > I was once processing 1 billion records and just changing the keys from > String to Long increased the performance by 20%. > > Ignore if this is not the case. >