Thanks Sini! I intended to create a new JIRA, but than changed my mind and just picky-backed it to the existing one, as it's highly related and we might be able to tackle it in one effort.
-Matthias On 5/18/17 12:12 AM, Peter Sinoros Szabo wrote: > Hi Michal, > > yes, I know its beyond the scope of KIP-138, but from previous messages > from Matthias I thought that he will create a new ticket, but it seems > that instead he added it to KAFKA-3514. I will update that ticket with my > thoughts. > > Thanks, > Sini > > > > From: Michal Borowiecki <michal.borowie...@openbet.com> > To: users@kafka.apache.org > Date: 2017/05/17 10:15 > Subject: Re: Order of punctuate() and process() in a stream > processor > > > > Hi Sini, > > This is beyond the score of KIP-138 but > https://issues.apache.org/jira/browse/KAFKA-3514 exists to track such > improvements > > Thanks, > > Michal > > On 17 May 2017 5:10 p.m., Peter Sinoros Szabo > <peter.sinoros-sz...@hu.ibm.com> wrote: > > Hi, > > I see, now its clear why the repeated punctuations use the same time value > > in that case. > > Do you have a JIRA ticket to track improvement ideas for that? > > It would be great to have an option to: > - advance the stream time before calling the process() on a new record - > this would prevent to process a message in the wrong punctuation > "segment". > - use fine grained advance of stream time for the "missed" punctuations - > > this would ease the processing of burst messages after some silence. I do > not see if KIP-138 may solve this or not. > > Regards > > -Sini > > > > From: "Matthias J. Sax" <matth...@confluent.io> > To: users@kafka.apache.org > Date: 2017/05/12 19:19 > Subject: Re: Order of punctuate() and process() in a stream > processor > > > > Thanks for sharing. > > As punctuate is called with "streams time" you see the same time value > multiple times. It's again due to the coarse grained advance of "stream > time". > > @Thomas: I think, the way we handle it just simplifies the > implementation of punctuations. I don't see any other "advantage". > > > I will create a JIRA to track this -- we are currently working on some > improvements of punctuation and time management already, and it seems to > be another valuable improvement. > > > -Matthias > > > On 5/12/17 10:07 AM, Peter Sinoros Szabo wrote: >> Well, this is also a good question, because it is triggered with the > same >> timestamp 3 times, so in order to create my update for both three > seconds, >> I will have to count the number of punctuations and calculate the missed > > >> stream times for myself. It's ok for me to trigger it 3 times, but the >> timestamp should not be the same in each, but should be increased by the > > >> schedule time in each punctuate. >> >> - Sini >> >> >> >> From: Thomas Becker <tobec...@tivo.com> >> To: "users@kafka.apache.org" <users@kafka.apache.org> >> Date: 2017/05/12 18:57 >> Subject: RE: Order of punctuate() and process() in a stream >> processor >> >> >> >> I'm a bit troubled by the fact that it fires 3 times despite the stream >> time being advanced all at once; is there a scenario when this is >> beneficial? >> >> ________________________________________ >> From: Matthias J. Sax [matth...@confluent.io] >> Sent: Friday, May 12, 2017 12:38 PM >> To: users@kafka.apache.org >> Subject: Re: Order of punctuate() and process() in a stream processor >> >> Hi Peter, >> >> It's by design. Streams internally tracks time progress (so-called >> "streams time"). "streams time" get advanced *after* processing a > record. >> >> Thus, in your case, "stream time" is still at its old value before it >> processed the first message of you send "burst". After that, "streams >> time" is advanced by 3 seconds, and thus, punctuate fires 3 time. >> >> I guess, we could change the design and include scheduled punctuations >> when advancing "streams time". But atm, we just don't do this. >> >> Does this make sense? >> >> Is this critical for your use case? Or do you just want to understand >> what's happening? >> >> >> -Matthias >> >> >> On 5/12/17 8:59 AM, Peter Sinoros Szabo wrote: >>> Hi, >>> >>> >>> Let's assume the following case. >>> - a stream processor that uses the Processor API >>> - context.schedule(1000) is called in the init() >>> - the processor reads only one topic that has one partition >>> - using custom timestamp extractor, but that timestamp is just a wall >>> clock time >>> >>> >>> Image the following events: >>> 1., for 10 seconds I send in 5 messages / second >>> 2., does not send any messages for 3 seconds >>> 3., starts the 5 messages / second again >>> >>> I see that punctuate() is not called during the 3 seconds when I do not >>> send any messages. This is ok according to the documentation, because >>> there is not any new messages to trigger the punctuate() call. When the >>> first few messages arrives after a restart the sending (point 3. above) > > >> I >>> see the following sequence of method calls: >>> >>> 1., process() on the 1st message >>> 2., punctuate() is called 3 times >>> 3., process() on the 2nd message >>> 4., process() on each following message >>> >>> What I would expect instead is that punctuate() is called first and > then >>> process() is called on the messages, because the first message's >> timestamp >>> is already 3 seconds older then the last punctuate() was called, so the >>> first message belongs after the 3 punctuate() calls. >>> >>> Please let me know if this is a bug or intentional, in this case what > is >>> the reason for processing one message before punctuate() is called? >>> >>> >>> Thanks, >>> Peter >>> >>> Péter Sinóros-Szabó >>> Software Engineer >>> >>> Ustream, an IBM Company >>> Andrassy ut 39, H-1061 Budapest >>> Mobile: +36203693050 >>> Email: peter.sinoros-sz...@hu.ibm.com >>> >> >> ________________________________ >> >> This email and any attachments may contain confidential and privileged >> material for the sole use of the intended recipient. Any review, > copying, >> or distribution of this email (or any attachments) by others is >> prohibited. If you are not the intended recipient, please contact the >> sender immediately and permanently delete this email and any > attachments. >> No employee or agent of TiVo Inc. is authorized to conclude any binding >> agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo >> Inc. may only be made by a signed written agreement. >> >> >> >> >> > > [attachment "signature.asc" deleted by Peter Sinoros Szabo/Hungary/IBM] > > > > > > >
signature.asc
Description: OpenPGP digital signature