I have a Kafka topic which will receive a large amount of data. This data has an 'id' field. I need to look up the id in an external db, see if we are tracking that id, and if yes, we process that message, if not, we ignore it.
99% of the data will be for ids which are not being tracked - 1% or so will be for ids which are tracked. My concern is, that there'd be a lot of round trips to the db made just to check the id, and if it'd be better to cache the ids being tracked somewhere, so other ids are ignored. I was considering sending a message to another (or the same topic) whenever a new id is added to the track list, and that id should then get processed on the node which will process the messages. Should I just cache all ids on all nodes (which may be a large amount), or is there a way to only cache the id on the same kafka streams node which will receive data for that id?