Yes and no.

It does not depend on the number of tuples but on the timestamps of the
tuples.

I would assume, that records in the high volume stream have timestamps
that are only a few milliseconds from each other, while for the low
volume KTable, record have timestamp differences that are much bigger
(maybe seconds).

Thus, even if you schedule a punctuation every 30 seconds, it will get
triggered as expected. As you get KTable input on a second basis that
advanced KTable time in larger steps -- thus KTable always "catches up".

Only for the (real time) case, that a single partition does not make
process because no new data gets appended that is longer than your
punctuation interval, some calls to punctuate might not fire.

Let's say the KTable does not get an update for 5 Minutes, than you
would miss 9 calls to punctuate(), and get only a single call after the
KTable update. (Of course, only if all partitions advance time accordingly.)


Does this make sense?


-Matthias

On 2/1/17 7:37 AM, Elliot Crosby-McCullough wrote:
> Hi there,
> 
> I've been reading through the Kafka Streams documentation and there seems
> to be a tricky limitation that I'd like to make sure I've understood
> correctly.
> 
> The docs[1] talk about the `punctuate` callback being based on stream time
> and that all incoming partitions of all incoming topics must have
> progressed through the minimum time interval for `punctuate` to be called.
> 
> This seems to be like a problem for situations where you have one very fast
> and one very slow stream being processed together, for example joining a
> fast-moving KStream to a slow-changing KTable.
> 
> Have I misunderstood something or is this relatively common use case not
> supported with the `punctuate` callback?
> 
> Many thanks,
> Elliot
> 
> [1]
> http://docs.confluent.io/3.1.2/streams/developer-guide.html#defining-a-stream-processor
> (see the "Attention" box)
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to