Thanks for your reply Vinod.I've been thinking about partitioning
the data to having multiple reducers each one working on a contiguous
part of the sort space. The problems is the keys are a combination of
URLs and RDF BNodes. I can't see a way, without previously analysing the
data, of
Hi,
I'm getting the below error while trying to sort a lot of data with Hadoop.
I strongly suspect the node the merge is on is running out of local disk space.
Assuming this is the case, is there any way
to get around this limitation considering I can't increase the local disk space
That's a lot of data to process for a single reducer. You should try
increasing the number of reducers to achieve more parallelism and also try
modifying your logic to avoid significant skew in the reducers.
Unfortunately this means rethinking about your app, but that's the only way
about it. It