Okay thank you for your help. Snappy just works fine for me.
2013/11/3 Asaf Mesika <[email protected]> > HBase will compress the entire KeyValue that's one thing. > Second thing: If you use TableOutputFormat I believe the Put will be > inserted into HBase in the reducer side. > 3rd - the compression only takes place in the Flush - which means you data > is travelling uncompressed between teh mapper, reducer and HBase WAL / > Memstore. > > Compressing in Java can be done through > http://commons.apache.org/proper/commons-compress/zip.html. > For speed - go for Snappy - but for POC zip should do the trick. > > > > On Sat, Nov 2, 2013 at 6:46 PM, John <[email protected]> wrote: > > > You mean I should use the BitSet, transform it into bytes and then > compress > > it by my own in the map-function? Hmmm ... I could try it. What is the > best > > way to compress it in java? > > > > BTW. I'm not sure how exactly the hbase compression works. As I > mentioned I > > have allready enabled the LZO compression for the columnfamily. The > > question is, where the bytes are compressed? Directly in the map-function > > (If no, is it possible to compress it there with lzo?!) or in the region > > server? > > > > kind regards > > > > > > 2013/11/2 Asaf Mesika <[email protected]> > > > > > If mean, if you take all those bytes if the bit set and zip them, > > wouldn't > > > you reduce it significantly? Less traffic on the wire, memory in HBase, > > > etc. > > > > > > On Saturday, November 2, 2013, John wrote: > > > > > > > I already use LZO compression in HBase. Or do you mean a compressed > > Java > > > > object? Do you know an implementation? > > > > > > > > kind regards > > > > > > > > > > > > 2013/11/2 Asaf Mesika <[email protected] <javascript:;>> > > > > > > > > > I would try to compress this bit set. > > > > > > > > > > On Nov 2, 2013, at 2:43 PM, John <[email protected] > > > <javascript:;>> > > > > wrote: > > > > > > > > > > > Hi, > > > > > > > > > > > > thanks for your answer! I increase the "Map Task Maximum Heap > Size" > > > to > > > > > 2gb > > > > > > and it seems to work. The OutOfMemoryEroror is gone. But the > HBase > > > > Region > > > > > > server are now crashing all the time :-/ I try to store the > > bitvector > > > > > > (120mb in size) for some rows. This seems to be very memory > > > intensive, > > > > > the > > > > > > usedHeapMB increase very fast (up to 2gb). I'm not sure if it is > > the > > > > > > reading or the writing task which causes this, but I thnk its the > > > > writing > > > > > > task. Any idea how to minimize the memory usage? My mapper looks > > like > > > > > this: > > > > > > > > > > > > public class MyMapper extends TableMapper<ImmutableBytesWritable, > > > Put> > > > > { > > > > > > > > > > > > private void storeBitvectorToHBase( > > > > > > Put row = new Put(name); > > > > > > row.setWriteToWAL(false); > > > > > > row.add(cf, Bytes.toBytes("columname"), > > > > > toByteArray(bitvector)); > > > > > > ImmutableBytesWritable key = new ImmutableBytesWritable( > > > > > > name); > > > > > > context.write(key, row); > > > > > > } > > > > > > } > > > > > > > > > > > > > > > > > > kind regards > > > > > > > > > > > > > > > > > > 2013/11/1 Jean-Marc Spaggiari <[email protected] > > <javascript:;>> > > > > > > > > > > > >> Ho John, > > > > > >> > > > > > >> You might be better to ask this on the CDH mailing list since > it's > > > > more > > > > > >> related to Cloudera Manager than HBase. > > > > > >> > > > > > >> In the meantime, can you try to update the "Map Task Maximum > Heap > > > > Size" > > > > > >> parameter too? > > > > > >> > > > > > >> JM > > > > > >> > > > > > >> > > > > > >> 2013/11/1 John <[email protected] <javascript:;>> > > > > > >> > > > > > >>> Hi, > > > > > >>> > > > > > >>> I have a problem with the memory. My use case is the following: > > > I've > > > > > >> crated > > > > > >>> a MapReduce-job and iterate in this over every row. If the row > > has > > > > more > > > > > >>> than for example 10k columns I will create a bloomfilter (a > > bitSet) > > > > for > > > > > >>> this row and store it in the hbase structure. This worked fine > so > > > > far. > > > > > >>> > > > > > >>> BUT, now I try to store a BitSet with 1000000000 elements = > > ~120mb > > > in > > > > > >> size. > > > > > >>> In every map()-function there exist 2 BitSet. If i try to > execute > > > the > > > > > >>> MR-job I got this error: http://pastebin.com/DxFYNuBG > > > > > >>> > > > > > >>> Obviously, the tasktracker does not have enougth memory. I try > to > > > > > adjust > > > > > >>> the configuration for the memory, but I'm not sure which is the > > > right > > > > > >> one. > > > > > >>> I try to change the "MapReduce Child Java Maximum Heap Size" > > value > > > > from > > > > > >> 1GB > > > > > >>> to 2GB, but still got the same error. > > > > > >>> > > > > > >>> Which parameters do I have to adjust? BTW. I'm using CDH 4.4.0 > > with > > > > the > > > > > >>> Clouder Manager > > > > > >>> > > > > > >>> kind regards > > > > > >>> > > > > > >> > > > > > > > > > > > > > > > > > > > >
