Re: Kafka Streams - Expiring Records By Process Time

Sicheng Liu Thu, 21 Jun 2018 14:08:58 -0700

Because we might get very "old" metrics (the timestamp on the metric is
very old, even though the metric is just delivered, for example,
backfill.). If you use event-time for retention, these old metrics could be
dropped and won't be aggregated. If we use process-time, at least it will
stay in state-store for some time for aggregation.


On Thu, Jun 21, 2018 at 1:24 PM, Matthias J. Sax <matth...@confluent.io>
wrote:

> I don't understand why event-time retention time cannot be used? Cannot
> elaborate?
>
> -Matthias
>
> On 6/21/18 10:59 AM, Sicheng Liu wrote:
> > Hi All,
> >
> > We have a use case that we aggregate some metrics with its event-time
> > (timestamp on the metric itself) using the simplest tumbling window. The
> > window itself can be set a retention but since we are aggregating with
> > event-time the retention has to be based on event-time too. However, in
> our
> > scenario, we have some late arrival metrics (up to one year) and we hope
> > the window retention can be based on process-time so that we can hold the
> > late arrival metrics for some time and expire them after some hours even
> > without new metrics of the same aggregation key coming.
> >
> > We have tried:
> > 1. Set TTL on RocksDB but it is disabled in Kafka Streams.
> > 2. Using low level processor API but scanning the statestore and delete
> one
> > by one significantly drops the performance.
> >
> > Please let us know if it is possible to aggregate by event-time but
> setting
> > the window retention based on its process-time.
> >
> > Thanks,
> > Sicheng
> >
>
>

Re: Kafka Streams - Expiring Records By Process Time

Reply via email to