That's a lot of data to process for a single reducer. You should try increasing the number of reducers to achieve more parallelism and also try modifying your logic to avoid significant skew in the reducers.
Unfortunately this means rethinking about your app, but that's the only way about it. It will also help you scale smoothly into the future if you have adjustable parallelism and more balanced data processing. +Vinod Hortonworks Inc. http://hortonworks.com/ On Fri, Jan 24, 2014 at 6:47 AM, Tim Potter <t...@yahoo-inc.com> wrote: > Hi, > I'm getting the below error while trying to sort a lot of data with Hadoop. > > I strongly suspect the node the merge is on is running out of local disk > space. Assuming this is the case, is there any way > to get around this limitation considering I can't increase the local disk > space available on the nodes? Like specify sort/merge parameters or similar. > > Thanks, > Tim. > > 2014-01-24 10:02:36,267 INFO [main] org.apache.hadoop.io.compress.CodecPool: > Got brand-new decompressor [.lzo_deflate] > 2014-01-24 10:02:36,280 INFO [main] org.apache.hadoop.mapred.Merger: Down to > the last merge-pass, with 100 segments left of total size: 642610678884 bytes > 2014-01-24 10:02:36,281 ERROR [main] > org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException > as:XXXXXX (auth:XXXXXX) > cause:org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in > shuffle in OnDiskMerger - Thread to merge on-disk map-outputs > 2014-01-24 10:02:36,282 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : > org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in > shuffle in OnDiskMerger - Thread to merge on-disk map-outputs > at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:167) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:371) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1284) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:153) > Caused by: org.apache.hadoop.fs.FSError: java.io.IOException: No space left > on device > at > org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:213) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) > at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:54) > at java.io.DataOutputStream.write(DataOutputStream.java:107) > at > org.apache.hadoop.mapred.IFileOutputStream.write(IFileOutputStream.java:88) > at > org.apache.hadoop.io.compress.BlockCompressorStream.compress(BlockCompressorStream.java:150) > at > org.apache.hadoop.io.compress.BlockCompressorStream.finish(BlockCompressorStream.java:140) > at > org.apache.hadoop.io.compress.BlockCompressorStream.write(BlockCompressorStream.java:99) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:54) > at java.io.DataOutputStream.write(DataOutputStream.java:107) > at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:249) > at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:200) > at > org.apache.hadoop.mapreduce.task.reduce.MergeManager$OnDiskMerger.merge(MergeManager.java:572) > at > org.apache.hadoop.mapreduce.task.reduce.MergeThread.run(MergeThread.java:94) > Caused by: java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:318) > at > org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:211) > ... 14 more > > 2014-01-24 10:02:36,284 INFO [main] org.apache.hadoop.mapred.Task: Runnning > cleanup for the task > > -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.