Greetings all, I have a use case where I want to calculate some metrics against sensor data using event time semantics (record time is event time) that I already have. I have years of it, but for this POC I'd like to just load the last few months so that we can start deriving trend lines now vs waiting to consume the real-time feeds for a few months.
So the question is, what is the steps I need to take to setup kafka itself, the topics, and streams such that I can send it say T-90 days of backlog data as well as real-time and have it process correctly? I have data loading into kafka 'feed' topic and I am setting the record timestamp to the event timestamp within the data, so event time semantics are setup from the start. I was running into data loss when segments are deleted faster than downstream can process. My knee jerk reaction was to set the broker configs log.retention.hours=2160 and log.segment.delete.delay.ms=21600000 and that made it go away, but I do not think this is right? For examples sake, assume a source topic 'feed', assume a stream to calculate min/max/avg to start with, using windows of 1 minute and 5 minutes. I wish to use the interactive queries against the window stores, and I wish to retain 90 days of window data to query. So I need advice for configuration of kafka, the 'feed' topic, the store topics, and the stores themselves. Thanks in advance!