BP62-New API for batched reads

horizonzy Mon, 14 Aug 2023 19:26:53 -0700

Hi, everyone:
There is a proposal about batched reading(
https://github.com/apache/bookkeeper/pull/4051), to introduce it to improve
read performance.


The objective of this proposal is to enhance the performance of entry
reading by introducing a batch entry reading protocol that takes into
account the expected count and size of entries.
1. Optimize entry reading performance: By reading multiple entries in a
single RPC request, the network communication and RPC call
overhead can be reduced, thereby optimizing the reading performance.
2. Minimize CPU resource consumption: The aggregation of multiple entries
into a single RPC request can help in reducing the number of requests and
responses, which in turn can lower the CPU resource consumption.
3. Streamline client code: The ability to read entries based on the
anticipated count or size, such as Apache Pulsar's approach of calculating
the start and end entry IDs for each read request based on the average size
of past entries, can add unnecessary complexity to the implementation and
can't guarantee reliable behavioral outcomes.

Here is the output of the BookKeeper perf tool with ensemble=1, write=1,
and ack=1.
Batch(100): Read 1000100 entries in 8904ms
Batch(500): Read 1000500 entries in 12182ms
Non-Batch: Read 1000130 entries in 199928ms

If you have any questions, feel free to talk about it. Thanks!

BP62-New API for batched reads

Reply via email to