That's a lot of data to process for a single reducer. You should try
increasing the number of reducers to achieve more parallelism and also try
modifying your logic to avoid significant skew in the reducers.

Unfortunately this means rethinking about your app, but that's the only way
about it. It will also help you scale smoothly into the future if you have
adjustable parallelism and more balanced data processing.

+Vinod
Hortonworks Inc.
http://hortonworks.com/


On Fri, Jan 24, 2014 at 6:47 AM, Tim Potter <t...@yahoo-inc.com> wrote:

>  Hi,
>   I'm getting the below error while trying to sort a lot of data with Hadoop.
>
> I strongly suspect the node the merge is on is running out of local disk 
> space. Assuming this is the case, is there any way
> to get around this limitation considering I can't increase the local disk 
> space available on the nodes?  Like specify sort/merge parameters or similar.
>
> Thanks,
>   Tim.
>
> 2014-01-24 10:02:36,267 INFO [main] org.apache.hadoop.io.compress.CodecPool: 
> Got brand-new decompressor [.lzo_deflate]
> 2014-01-24 10:02:36,280 INFO [main] org.apache.hadoop.mapred.Merger: Down to 
> the last merge-pass, with 100 segments left of total size: 642610678884 bytes
> 2014-01-24 10:02:36,281 ERROR [main] 
> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException 
> as:XXXXXX (auth:XXXXXX) 
> cause:org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
> shuffle in OnDiskMerger - Thread to merge on-disk map-outputs
> 2014-01-24 10:02:36,282 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
> shuffle in OnDiskMerger - Thread to merge on-disk map-outputs
>       at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:167)
>       at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:371)
>       at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:415)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1284)
>       at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:153)
> Caused by: org.apache.hadoop.fs.FSError: java.io.IOException: No space left 
> on device
>       at 
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:213)
>       at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>       at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
>       at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:54)
>       at java.io.DataOutputStream.write(DataOutputStream.java:107)
>       at 
> org.apache.hadoop.mapred.IFileOutputStream.write(IFileOutputStream.java:88)
>       at 
> org.apache.hadoop.io.compress.BlockCompressorStream.compress(BlockCompressorStream.java:150)
>       at 
> org.apache.hadoop.io.compress.BlockCompressorStream.finish(BlockCompressorStream.java:140)
>       at 
> org.apache.hadoop.io.compress.BlockCompressorStream.write(BlockCompressorStream.java:99)
>       at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:54)
>       at java.io.DataOutputStream.write(DataOutputStream.java:107)
>       at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:249)
>       at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:200)
>       at 
> org.apache.hadoop.mapreduce.task.reduce.MergeManager$OnDiskMerger.merge(MergeManager.java:572)
>       at 
> org.apache.hadoop.mapreduce.task.reduce.MergeThread.run(MergeThread.java:94)
> Caused by: java.io.IOException: No space left on device
>       at java.io.FileOutputStream.writeBytes(Native Method)
>       at java.io.FileOutputStream.write(FileOutputStream.java:318)
>       at 
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:211)
>       ... 14 more
>
> 2014-01-24 10:02:36,284 INFO [main] org.apache.hadoop.mapred.Task: Runnning 
> cleanup for the task
>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Reply via email to