RE: Limitation of key-value pairs for a particular key.

2013-01-18 Thread Sven Groot
Hi, I think I know what's going on here. It has to do with how many spills the map task performs. You are emitting the numbers in order, so if there is only one spill, they stay in order. For larger number of records, the map task will create more than one spill, which must be merged. Durin

RE: Spill Failed when io.sort.mb is increased

2012-08-06 Thread Sven Groot
Hi Arpit, I'm uncertain as to the exact cause of the exception (maybe an integer overflow somewhere?) but I'd just like to point out that in general, increasing io.sort.mb to such a high value is not necessarily a good thing. Sorting is an expensive operation, and uses non-linear time complexity.

RE: Reduce shuffle data transfer takes excessively long

2012-01-27 Thread Sven Groot
ough hoops in various sandboxes to read the OS default locale. If it is the case, getting the system locale and char set once and specifying it explicitly in the call to getBytes() or whatever should make a big difference. let me know if it works for you -Nick From: Sven Groot [mai

Reduce shuffle data transfer takes excessively long

2012-01-26 Thread Sven Groot
Hello, I have been working on profiling the performance of certain parts of Hadoop 0.20.203.0. For this reason, I have set up a simple cluster that uses one node as the Namenode/Jobtracker, and one node as the sole Datanode/tasktracker. In this experiment, I run a job consisting of a single