On Tue, Nov 7, 2017 at 10:47 AM, Istvan Soos <[email protected]> wrote:
> On the website [0] I gather that data compaction is mostly about > cleaning up after we delete a ledger. Is there a feature or plan to > implement entry-level compaction, e.g. to have an ID that uniquely > identifies an entity, and if there are two events for that entity, > only retain the last one? > > [0]: https://bookkeeper.apache.org/docs/latest/getting-started/concepts/ Currently we don't have an open item about supporting this "log compaction" feature. But I would to learn more about your use case and to see how we can support you. > > > Or do you implement it by using different ledgers, migrating from one > to another? In pulsar community, we are actually discussing a similar "log compaction" feature. Pulsar is the pub/sub messaging system built on Apache BookKeeper. The idea is almost same as what you said, it would compact the messages/entries based on some keys, and write the compacted messages as a separate ledger. > How does it work out with handovers of what is considered > the main ledger to write to or read from? > You need some sort of metadata to track the list of ledgers and update the metadata once a compacted ledger is generated. Hope this explain your questions. Would love to chat more about your user case. > > Thanks, > Istvan >
