Hi,

Actually, for us it would be nice to be able to hook into the compaction, too.

We store records that are basically events that occur at certain times. We 
store the record itself as qualifier and a timeline as column value (so 
multiple records+timelines per row key is possible). So when a new record comes 
in, we do a get for the timeline, merge the new timestamp with the existing 
timeline in memory and do a put to update the column value with the new 
timeline.

In our first version, we just wrote the individual timestamps as values and 
used versioning to keep all timestamps in the value. Then we combined all the 
timelines and individual timestamp into a single timeline in memory on each 
read. We ran a MR job periodically to do the timeline combining in the table 
and delete the obsolete timestamps in order to keep read performance OK 
(because otherwise the read operation would involve a lot of additional work to 
create a timeline and lots of versions would be created). In the end, the 
deletes in the MR job were a bottleneck (as I understand, but I was not on the 
project at that moment).

Now, if we could hook into the compactions, then we could just always insert 
individual timestamps as new versions and do the combining of versions into a 
single timeline during compaction (as compaction needs to go through the 
complete table anyway). This would also improve our insertion performance (no 
more gets in there, just puts like in the first version), which is nice. We 
collect internet routing information, which is collected at 80 million records 
per day with updates coming in in batches every 5 minutes 
(http://ris.ripe.net). We'd like to try to be efficient before just throwing 
more machines at the problem.

Will there be anything like this on the roadmap?


Cheers,
Friso



On May 27, 2010, at 1:01 AM, Jean-Daniel Cryans wrote:

> Invisible. What's your need?
> 
> J-D
> 
> On Wed, May 26, 2010 at 3:56 PM, Vidhyashankar Venkataraman
> <[email protected]> wrote:
>> Is there a way to customize the compaction function (like a hook provided by 
>> the API) or is it invisible to the user?
>> 
>> Thank you
>> Vidhya
>> 

Reply via email to