I have a Kafka topic which will receive a large amount of data.

This data has an 'id' field. I need to look up the id in an external db,
see if we are tracking that id, and if yes, we process that message, if
not, we ignore it.

99% of the data will be for ids which are not being tracked - 1% or so will
be for ids which are tracked.

My concern is, that there'd be a lot of round trips to the db made just to
check the id, and if it'd be better to cache the ids being tracked
somewhere, so other ids are ignored.

I was considering sending a message to another (or the same topic) whenever
a new id is added to the track list, and that id should then get processed
on the node which will process the messages.

Should I just cache all ids on all nodes (which may be a large amount), or
is there a way to only cache the id on the same kafka streams node which
will receive data for that id?

Reply via email to