The recommended solution would be to use Kafka Connect to load you DB
data into a Kafka topic.

With Kafka Streams you read your db-topic as KTable and do a (inne)
KStream-KTable join to lookup the IDs.


-Matthias

On 4/27/17 2:22 PM, Ali Akhtar wrote:
> I have a Kafka topic which will receive a large amount of data.
> 
> This data has an 'id' field. I need to look up the id in an external db,
> see if we are tracking that id, and if yes, we process that message, if
> not, we ignore it.
> 
> 99% of the data will be for ids which are not being tracked - 1% or so will
> be for ids which are tracked.
> 
> My concern is, that there'd be a lot of round trips to the db made just to
> check the id, and if it'd be better to cache the ids being tracked
> somewhere, so other ids are ignored.
> 
> I was considering sending a message to another (or the same topic) whenever
> a new id is added to the track list, and that id should then get processed
> on the node which will process the messages.
> 
> Should I just cache all ids on all nodes (which may be a large amount), or
> is there a way to only cache the id on the same kafka streams node which
> will receive data for that id?
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to