Because we might get very "old" metrics (the timestamp on the metric is very old, even though the metric is just delivered, for example, backfill.). If you use event-time for retention, these old metrics could be dropped and won't be aggregated. If we use process-time, at least it will stay in state-store for some time for aggregation.
On Thu, Jun 21, 2018 at 1:24 PM, Matthias J. Sax <matth...@confluent.io> wrote: > I don't understand why event-time retention time cannot be used? Cannot > elaborate? > > -Matthias > > On 6/21/18 10:59 AM, Sicheng Liu wrote: > > Hi All, > > > > We have a use case that we aggregate some metrics with its event-time > > (timestamp on the metric itself) using the simplest tumbling window. The > > window itself can be set a retention but since we are aggregating with > > event-time the retention has to be based on event-time too. However, in > our > > scenario, we have some late arrival metrics (up to one year) and we hope > > the window retention can be based on process-time so that we can hold the > > late arrival metrics for some time and expire them after some hours even > > without new metrics of the same aggregation key coming. > > > > We have tried: > > 1. Set TTL on RocksDB but it is disabled in Kafka Streams. > > 2. Using low level processor API but scanning the statestore and delete > one > > by one significantly drops the performance. > > > > Please let us know if it is possible to aggregate by event-time but > setting > > the window retention based on its process-time. > > > > Thanks, > > Sicheng > > > >