Caching in Kafka Streams to ignore garbage message

Ali Akhtar Thu, 27 Apr 2017 14:22:39 -0700

I have a Kafka topic which will receive a large amount of data.

This data has an 'id' field. I need to look up the id in an external db,
see if we are tracking that id, and if yes, we process that message, if
not, we ignore it.


99% of the data will be for ids which are not being tracked - 1% or so will
be for ids which are tracked.

My concern is, that there'd be a lot of round trips to the db made just to
check the id, and if it'd be better to cache the ids being tracked
somewhere, so other ids are ignored.

I was considering sending a message to another (or the same topic) whenever
a new id is added to the track list, and that id should then get processed
on the node which will process the messages.

Should I just cache all ids on all nodes (which may be a large amount), or
is there a way to only cache the id on the same kafka streams node which
will receive data for that id?

Caching in Kafka Streams to ignore garbage message

Reply via email to