---------- Forwarded message ----------
> From: Jeff Klukas <jklu...@simple.com>
> To: users@kafka.apache.org
> Cc:
> Date: Wed, 30 Mar 2016 11:14:53 -0400
> Subject: KStream-KTable join with the KTable given a "head start"
> I have a KStream that I want to enrich with some values from a lookup
> table. When a new key enters the KStream, there's likely to be a
> corresponding entry arriving on the KStream at the same time, so we end up
> with a race condition. If the KTable record arrives first, then its value
> is available for joining when the corresponding arrives on the KStream.  If
> the KStream record arrives first, however, we'll get a null join even if
> the KTable gets the corresponding record only milliseconds later.
>
> I'd like to give the KTable a "head start" of ~10 seconds, so that it gets
> a chance to get updated before the corresponding records arrive on the
> KStream. Could I achieve this using one of the existing Windowing
> strategies?
>
>
> ---------- Forwarded message ----------
> From: Guozhang Wang <wangg...@gmail.com>
> To: "users@kafka.apache.org" <users@kafka.apache.org>
> Cc:
> Date: Wed, 30 Mar 2016 13:51:03 -0700
> Subject: Re: KStream-KTable join with the KTable given a "head start"
> Hi Jeff,
>
> This is a common case of stream-table join, in that the joining results
> depending on the arrival ordering from these two sources.
>
> In Kafka Streams you can try to "synchronize" multiple input streams
> through the "TimestampExtractor" interface, which is used to assign a
> timestamp to each record polled from Kafka to start the processing. You
> can, for example, set the timestamps for your KStream within a later time
> interval and the timestamps for your KTable stream with an earlier time
> interval, so that the records from table are likely to be processed first.
> Note that this is an best effort, in that we cannot guarantee global
> ordering across streams while processing, that if you have a much later
> record coming from KTable then it will not block earlier records from
> KStream from being processed first. But we think this mechanism should be
> sufficient in practice.
>
> Let me know if it fits with your scenario, and if not we can talk about how
> it can be possibly improved.
>


What would I pass in for a window in this case? Or would I not pass in a
window?

I don't want to put any lower limit on the KTable timestamps I'd be willing
to join on (the corresponding entry in the KTable could have been from
weeks ago, or it could be fired right at the same time as the KStream
event).

Could I use JoinWindows.before() and pass in an arbitrarily long interval?

Reply via email to