Hi,
I think I know what's going on here. It has to do with how many spills the
map task performs.
You are emitting the numbers in order, so if there is only one spill, they
stay in order. For larger number of records, the map task will create more
than one spill, which must be merged. Durin
Hi Arpit,
I'm uncertain as to the exact cause of the exception (maybe an integer
overflow somewhere?) but I'd just like to point out that in general,
increasing io.sort.mb to such a high value is not necessarily a good thing.
Sorting is an expensive operation, and uses non-linear time complexity.
ough
hoops in various sandboxes to read the OS default locale.
If it is the case, getting the system locale and char set once and
specifying it explicitly in the call to getBytes() or whatever should make a
big difference.
let me know if it works for you
-Nick
From: Sven Groot [mai
Hello,
I have been working on profiling the performance of certain parts of Hadoop
0.20.203.0. For this reason, I have set up a simple cluster that uses one
node as the Namenode/Jobtracker, and one node as the sole
Datanode/tasktracker.
In this experiment, I run a job consisting of a single