Thanks i will try that.

Hamza

________________________________
De : Tauzell, Dave <dave.tauz...@surescripts.com>
Envoyé : vendredi 29 juillet 2016 03:18:47
À : users@kafka.apache.org
Objet : RE: Kafka streams Issue

Let's say you currently have:

Procesing App    ---> OUTPUT TOPIC ---> output consumer

You would ideally like the processing app to only write to the output topic 
every minute, but cannot easily do this.  So what you might be able to do is:


Processing App ---> INTERMIDIATE OUTPUT TOPIC --->  Coalesce Process --->>= 
OUTPUT TOPIC

The Coalesce Process is an application that does something like:

Bucket = new list()
Consumer = createConsumer()
While( message = Cosumer.next() ) {
    Window = calculate current window
   If message is after Window:
     Send Bucket to OUTPUT TOPIC
  Else
    Add message to Bucket

}

Dave Tauzell | Senior Software Engineer | Surescripts
O: 651.855.3042 | www.surescripts.com<http://www.surescripts.com> |   
dave.tauz...@surescripts.com
Connect with us: Twitter I LinkedIn I Facebook I YouTube


-----Original Message-----
From: Hamza HACHANI [mailto:hamza.hach...@supcom.tn]
Sent: Friday, July 29, 2016 9:53 AM
To: users@kafka.apache.org
Subject: RE: Kafka streams Issue

Hi Dave,

Could you explain a little bit much your idea ?
I can't figure out what you are suggesting.
Thank you

-Hamza
________________________________
De : Tauzell, Dave <dave.tauz...@surescripts.com> Envoyé : vendredi 29 juillet 
2016 02:39:53 À : users@kafka.apache.org Objet : RE: Kafka streams Issue

You could send the message immediately to an intermediary topic.  Then have a 
consumer of that topic that pull messages off and waits until the minute is up.

-Dave

Dave Tauzell | Senior Software Engineer | Surescripts
O: 651.855.3042 | www.surescripts.com<http://www.surescripts.com> |   
dave.tauz...@surescripts.com
Connect with us: Twitter I LinkedIn I Facebook I YouTube


-----Original Message-----
From: Hamza HACHANI [mailto:hamza.hach...@supcom.tn]
Sent: Friday, July 29, 2016 9:36 AM
To: users@kafka.apache.org
Subject: Kafka streams Issue

> Good morning,
>
> I'm an ICT student in TELECOM BRRETAGNE (a french school).
> I did follow your presentation in Youtube and i found them really
> intresting.
> I'm trying to do some stuffs with Kafka. And now it has been  about 3
> days that I'm blocked.
> I'm trying to control the time in which my processing application send
> data to the output topic .
> What i'm trying to do is to make the application process data from the
> input topic all the time but send the messages only at the end of a
> minute/an hour/a month .... (the notion of windowing).
> For the moment what i managed to do is that the application instead of
> sending data only at the end of the minute,it send it anytime it does
> receive it from the input topic.
> Have you any suggestions to help me?
> I would be really gratfeul.


Preliminary answer for now:

> For the moment what i managed to do is that the application instead of
sending data only at the end
> of the minute,it send it anytime it does receive it from the input topic.

This is actually the expected behavior at the moment.

The main reason for this behavior is that, in stream processing, we never know 
whether there is still late-arriving data to be received.  For example, imagine 
you have 1-minute windows based on event-time.  Here, it may happen that, after 
the first 1 minute window has passed, another record arrives five minutes later 
but, according to the record's event-time, it should have still been part of 
the first 1-minute window.  In this case, what we typically want to happen is 
that the first 1-window will be updated/reprocessed with the late-arriving 
record included.  In other words, just because 1 minute has passed (= the 
1-minute window is "done") it does not mean that actually all the data for that 
time interval has been processed already -- so sending only a single update 
after 1 minute has passed would even produce incorrect results in many cases.  
For this reason you currently see a downstream update anytime there is a new 
incoming data record ("send it anytime it does receive it from the input 
topic").  So the point here is due ensure correctness of processing.

That said, one known drawback of the current behavior is that users haven't 
been able to control (read: decrease/reduce) the rate/volume of the resulting 
downstream updates.  For example, if you have an input topic with a rate of 1 
million msg/s (which is easy for Kafka), some users want to aggregate/window 
results primarily to reduce the input rate to a lower numbers (e.g. 1 thousand 
msg/s) so that the data can be fed from Kafka to other systems that might not 
scale as well as Kafka.  To help these use cases we will have a new 
configuration parameter in the next major version of Kafka that allows you to 
control the rate/volume of downstream updates.
Here, the point is to help users optimize resource usage rather than 
correctness of processing.  This new parameter should also help you with your 
use case.  But even this new parameter is not based on strict time behavior or 
time windows.

This e-mail and any files transmitted with it are confidential, may contain 
sensitive information, and are intended solely for the use of the individual or 
entity to whom they are addressed. If you have received this e-mail in error, 
please notify the sender by reply e-mail immediately and destroy all copies of 
the e-mail and any attachments.

Reply via email to