I would try to compress this bit set.
On Nov 2, 2013, at 2:43 PM, John <[email protected]> wrote:
> Hi,
>
> thanks for your answer! I increase the "Map Task Maximum Heap Size" to 2gb
> and it seems to work. The OutOfMemoryEroror is gone. But the HBase Region
> server are now crashing all the time :-/ I try to store the bitvector
> (120mb in size) for some rows. This seems to be very memory intensive, the
> usedHeapMB increase very fast (up to 2gb). I'm not sure if it is the
> reading or the writing task which causes this, but I thnk its the writing
> task. Any idea how to minimize the memory usage? My mapper looks like this:
>
> public class MyMapper extends TableMapper<ImmutableBytesWritable, Put> {
>
> private void storeBitvectorToHBase(
> Put row = new Put(name);
> row.setWriteToWAL(false);
> row.add(cf, Bytes.toBytes("columname"), toByteArray(bitvector));
> ImmutableBytesWritable key = new ImmutableBytesWritable(
> name);
> context.write(key, row);
> }
> }
>
>
> kind regards
>
>
> 2013/11/1 Jean-Marc Spaggiari <[email protected]>
>
>> Ho John,
>>
>> You might be better to ask this on the CDH mailing list since it's more
>> related to Cloudera Manager than HBase.
>>
>> In the meantime, can you try to update the "Map Task Maximum Heap Size"
>> parameter too?
>>
>> JM
>>
>>
>> 2013/11/1 John <[email protected]>
>>
>>> Hi,
>>>
>>> I have a problem with the memory. My use case is the following: I've
>> crated
>>> a MapReduce-job and iterate in this over every row. If the row has more
>>> than for example 10k columns I will create a bloomfilter (a bitSet) for
>>> this row and store it in the hbase structure. This worked fine so far.
>>>
>>> BUT, now I try to store a BitSet with 1000000000 elements = ~120mb in
>> size.
>>> In every map()-function there exist 2 BitSet. If i try to execute the
>>> MR-job I got this error: http://pastebin.com/DxFYNuBG
>>>
>>> Obviously, the tasktracker does not have enougth memory. I try to adjust
>>> the configuration for the memory, but I'm not sure which is the right
>> one.
>>> I try to change the "MapReduce Child Java Maximum Heap Size" value from
>> 1GB
>>> to 2GB, but still got the same error.
>>>
>>> Which parameters do I have to adjust? BTW. I'm using CDH 4.4.0 with the
>>> Clouder Manager
>>>
>>> kind regards
>>>
>>