Hi Tom,

Thank you so much for your response. I had a feeling that approach would run into scalability problems, so thank you for confirming that.

Another approach would be to have each service request a subscription from the event store. The event store then creates a unique kafka topic for each service. If multiple instances of a service requests a subscription, the event store should only create the topic once and return the name of the topic to the service.

A reader/writer would read from HBase and push new messages into each topic.

In this case, I would set my topics to retain message for, say, 5 days in the event that a service goes down and we need to bring it back up.

The event store would also query kafka to see which topics have not been read from for say 30 days and delete them. This would be for cases where a service is decommissioned. Does kafka provide a way to check when the topic was last read from?

Does this sound like a saner way?

Cheers,
Francis

On 5/09/2016 11:00 PM, Tom Crayford wrote:
inline

On Mon, Sep 5, 2016 at 12:00 AM, F21 <f21.gro...@gmail.com> wrote:

Hi all,

I am currently looking at using Kafka as a "message bus" for an event
store. I plan to have all my events written into HBase for permanent
storage and then have a reader/writer that reads from HBase to push them
into kafka.

In terms of kafka, I plan to set it to keep all messages indefinitely.
That way, if any consumers need to rebuild their views or if new consumers
are created, they can just read from the stream to rebuild the views.

Kafka isn't designed at all for permanent message storage, except for
compacted topics. I suggest you rethink this, unless compacted topics work
for you (Kafka is not designed to keep unbounded amounts of data for
unbounded amounts of time, simply to provide messaging and replay over
short, bounded windows).


I plan to use domain-driven design and will use the concept of aggregates
in the system. An example of an aggregate might be a customer. All events
for a given aggregate needs to be delivered in order. In the case of kafka,
I would need to over partition the system by a lot, as any changes in the
number of partitions could result in messages that were bound for a given
partition being pushed into a newly created partition. Are there any issues
if I create a new partition every time an aggregate is created? In a system
with a large amount of aggregates, this will result in millions or hundreds
of millions of partitions. Will this cause performance issues?

Yes.

Kafka is designed to support hundreds to thousands of partitions per
machine, not millions (and there is an upper bound per cluster which is
well below one million). I suggest you rethink this and likely use a
standard "hash based partitioning" scheme.


Cheers,

Francis



Reply via email to