Hi, Is there a way to store a custom BitSet for every row and add new bits while importing? I can't use the bloomfilter that is already there because in every columnnames are 2 elements.
Here is my scenario: My table looks like this: rowKey1 -> cf:<data1,data2>, cf:<data3,data4>, ... rowKey2 -> cf:<data234,data5>. ... the columname includes data1 and data2. This setup works for me now, but I try to imrpove it. I'm using the BulkLoad feature. At first I import a CSV file that looks like this: ROWKEY COLUMNFAMILY COLUMNAME HASH_INDEX_1 HASH_INDEX_2 rowKey1 cf <data1,data2> 5 12 rowKey1 cf <data3,data4> 8 5 For every hash in HASH_INDEX_1/2 I creat a new column with the index as a name and the columnfamily "bloomfilter1" or "bloomfilter2". I store the columname as a 4byte Integer String. For the Example above I would store this: bloomfilter1:5 and bloomfilter2:12. This method works fine, but the export and backtransformation to a BitSet become very slow if the bloomfilter is to big (> 1 million). So a better solution would be to store only the BitSet instead of a 4byte Integer for every index. Does anyone now if it is possible to create this filter while importing the data? thanks