Hi, Actually, for us it would be nice to be able to hook into the compaction, too.
We store records that are basically events that occur at certain times. We store the record itself as qualifier and a timeline as column value (so multiple records+timelines per row key is possible). So when a new record comes in, we do a get for the timeline, merge the new timestamp with the existing timeline in memory and do a put to update the column value with the new timeline. In our first version, we just wrote the individual timestamps as values and used versioning to keep all timestamps in the value. Then we combined all the timelines and individual timestamp into a single timeline in memory on each read. We ran a MR job periodically to do the timeline combining in the table and delete the obsolete timestamps in order to keep read performance OK (because otherwise the read operation would involve a lot of additional work to create a timeline and lots of versions would be created). In the end, the deletes in the MR job were a bottleneck (as I understand, but I was not on the project at that moment). Now, if we could hook into the compactions, then we could just always insert individual timestamps as new versions and do the combining of versions into a single timeline during compaction (as compaction needs to go through the complete table anyway). This would also improve our insertion performance (no more gets in there, just puts like in the first version), which is nice. We collect internet routing information, which is collected at 80 million records per day with updates coming in in batches every 5 minutes (http://ris.ripe.net). We'd like to try to be efficient before just throwing more machines at the problem. Will there be anything like this on the roadmap? Cheers, Friso On May 27, 2010, at 1:01 AM, Jean-Daniel Cryans wrote: > Invisible. What's your need? > > J-D > > On Wed, May 26, 2010 at 3:56 PM, Vidhyashankar Venkataraman > <[email protected]> wrote: >> Is there a way to customize the compaction function (like a hook provided by >> the API) or is it invisible to the user? >> >> Thank you >> Vidhya >>
