Usually Hadoop Map Reduce deals with row based data. ReduceContext<KEYIN,VALUEIN,KEYOUT,VALUEOUT>
if you need to write a lot to hdfs file you can get OutputStream to hdfs file and write bytes. On Fri, Aug 22, 2014 at 3:30 PM, Yuriy <yuriythe...@gmail.com> wrote: > Thank you, Alexander. That, at least, explains the problem. And what > should be the workaround if the combined set of data is larger than 2 GB? > > > On Fri, Aug 22, 2014 at 1:50 PM, Alexander Pivovarov <apivova...@gmail.com > > wrote: > >> Max array size is max integer. So, byte array can not be bigger than 2GB >> On Aug 22, 2014 1:41 PM, "Yuriy" <yuriythe...@gmail.com> wrote: >> >>> Hadoop Writable interface relies on "public void write(DataOutput out)" >>> method. >>> It looks like behind DataOutput interface, Hadoop uses DataOutputStream, >>> which uses a simple array under the cover. >>> >>> When I try to write a lot of data in DataOutput in my reducer, I get: >>> >>> Caused by: java.lang.OutOfMemoryError: Requested array size exceeds VM >>> limit at java.util.Arrays.copyOf(Arrays.java:3230) at >>> java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113) at >>> java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93) >>> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140) at >>> java.io.DataOutputStream.write(DataOutputStream.java:107) at >>> java.io.FilterOutputStream.write(FilterOutputStream.java:97) >>> >>> Looks like the system is unable to allocate the continuous array of the >>> requested size. Apparently, increasing the heap size available to the >>> reducer does not help - it is already at 84GB (-Xmx84G) >>> >>> If I cannot reduce the size of the object that I need to serialize (as >>> the reducer constructs this object by combining the object data), what >>> should I try to work around this problem? >>> >>> Thanks, >>> >>> Yuriy >>> >> >