I'm considering an architecture where Kafka acts as the primary datastore,
with infinite retention of messages. The messages in this case will be
domain events that must not be lost. Different downstream consumers would
ingest the events and build up various views on them, e.g. aggregated
stats, indexes by various properties, full text search, etc.

The important bit is that I'd like to avoid having a separate datastore for
long-term archival of events, since:

1) I want to make it easy to spin up new materialized views based on past
events, and only having to deal with Kafka is simpler.
2) Instead of having some sort of two-phased import process where I need to
first import historical data and then do a switchover to the Kafka topics,
I'd rather just start from offset 0 in the Kafka topics.
3) I'd like to be able to use standard tooling where possible, and most
tools for ingesting events into e.g. Spark Streaming would be difficult to
use unless all the data was in Kafka.

I'd like to know if anyone here has tried this use case. Based on the
presentations by Jay Kreps and Martin Kleppmann I would expect that someone
had actually implemented some of the ideas they're been pushing. I'd also
like to know what sort of problems Kafka would pose for long-term storage –
would I need special storage nodes, or would replication be sufficient to
ensure durability?

Daniel Schierbeck
Senior Staff Engineer, Zendesk

Reply via email to