I’ve been reading 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-33+-+Add+a+time+based+log+index
 and trying to determine whether I can use the time-based index as an efficient 
way to sort a stream of messages into timestamp (CreateTime) order.

I am dealing with a number of sources emitting messages that are then processed 
in a distributed fashion and written to a Kafka topic. During this processing, 
the original order of the messages is not strictly maintained. Each message has 
an embedded timestamp. I’d like to be able to sort these messages back into 
timestamp order, allowing for a certain lateness interval, before processing 
them further. For example, supposing the lateness interval is 5 minutes, at 
time T I’d like to consume from the topic all messages with timestamp up to (T 
- 5 minutes), in timestamp order. The assumption is that a message should be no 
more than 5 minutes late; if it is more than 5 minutes late, it can be 
discarded. Is this something that can be done with the time-based index?

Thanks,

Ray

Reply via email to