It just happened again and I have noticed it happen exactly when a new
instance went up (another one crushed).
Is it possible that it relates to the "rebalancing" of the instance? it
losses the aggregated data for several seconds and in those seconds it gets
traffic?

Should I develop some logic on the app side not to "start" consuming unless
the app is in RUNNING state?

Thanks

On Tue, Nov 27, 2018 at 7:08 PM Nitay Kufert <nita...@ironsrc.com> wrote:

> Hey everyone,
> We are running Kafka streams ver 2.1.0 (Scala).
>
> The specific use-case I am talking about can be simplified into the
> following stream:
> *given:* inputStream of type [String, BigDecimal] (for example: ("1234",
> 120.67))
> inputStream.map(createUniqueKey).groupByKey.reduce(_ +
> _).toStream.to("daily_counts")
>
> So, at the end of the stream, we are getting a compacted topic named
> "daily_counts" that should hold all the "counts" of specific keys.
> When consuming from the topic we will get something like:
> "unique_key_for_1234" -> 110.67
> "unique_key_for_1234" -> 120.67
> "unique_key_for_1234" -> 130.67
> "unique_key_for_1234" -> 140.67
> "unique_key_for_1234" -> 150.67
> Where the value that we actually use is the last one 150.67.
>
> 99% percent of the time, things works as expected - but I have noticed
> something strange that keeps happening to us every couple of days:
> When consuming form the topic we get:
> CreateTime:1543291897088 some_key_prefix_day_of_year_id 429.07
> CreateTime:1543291913519 some_key_prefix_day_of_year_id 429.55
> CreateTime:1543291937106 some_key_prefix_day_of_year_id 430.01
> CreateTime:1543291936980 some_key_prefix_day_of_year_id 430.49
> CreateTime:1543291952062 some_key_prefix_day_of_year_id 430.85
> CreateTime:1543291968703 some_key_prefix_day_of_year_id 431.25
> CreateTime:1543291987571 some_key_prefix_day_of_year_id 431.71
> CreateTime:1543292022037 some_key_prefix_day_of_year_id 432.17
> CreateTime:1543292029040 some_key_prefix_day_of_year_id 432.64
> CreateTime:1543292032400 some_key_prefix_day_of_year_id 433.12 // In my
> time zone: Tuesday, November 27, 2018 6:13:52.400 AM GMT+02:00
> CreateTime:1543292072814 some_key_prefix_day_of_year_id 0.47 // In my
> time zone: Tuesday, November 27, 2018 6:14:32.814 AM GMT+02:00
> CreateTime:1543292081922 some_key_prefix_day_of_year_id 0.83
> CreateTime:1543292132950 some_key_prefix_day_of_year_id 2.61
> CreateTime:1543292169435 some_key_prefix_day_of_year_id 2.97
> CreateTime:1543292182448 some_key_prefix_day_of_year_id 3.43
> CreateTime:1543292201472 some_key_prefix_day_of_year_id 3.88
> CreateTime:1543292205065 some_key_prefix_day_of_year_id 4.36
> CreateTime:1543292216853 some_key_prefix_day_of_year_id 4.82
> CreateTime:1543292242955 some_key_prefix_day_of_year_id 5.4
>
> Meaning, somehow we lost our aggregated data.
>
> I am looking for an explanation... things I have checked and found out:
> 1. No ERROR to be found on the Kafka cluster
> 2. No ERROR in the application side around this time (a couple of minutes
> before we got "too many open files" from rocksdb but on a different store)
> 3. It seems to happen ONLY for "high rate" keys
> 4. When it happens - it happens on several keys at the same time
> 5. Around the same time, we had a Spot-Instance replacement.
> 6. Check the details on the reduce function and saw that only in the first
> invocation it doesn't actually apply the reduce function.
>
> Maybe we have some misconfiguration that can cause it?
> Any idea will be welcomed,
>
> Thanks,
> Nitay
> --
> Nitay Kufert
> Backend Developer
> [image: ironSource] <http://www.ironsrc.com/>
>
> email nita...@ironsrc.com
> mobile +972-54-5480021 <+972%2054-548-0021>
> fax +972-77-5448273 <+972%2077-544-8273>
> skype nitay.kufert.ssa
> 9 Ehad Ha'am st. Tel- Aviv
> ironsrc.com <http://www.ironsrc.com/>
> [image: linkedin] <https://www.linkedin.com/company/ironsource>[image:
> twitter] <https://twitter.com/ironsource>[image: facebook]
> <https://www.facebook.com/ironSource>[image: googleplus]
> <https://plus.google.com/+ironsrc>
> This email (including any attachments) is for the sole use of the intended
> recipient and may contain confidential information which may be protected
> by legal privilege. If you are not the intended recipient, or the employee
> or agent responsible for delivering it to the intended recipient, you are
> hereby notified that any use, dissemination, distribution or copying of
> this communication and/or its content is strictly prohibited. If you are
> not the intended recipient, please immediately notify us by reply email or
> by telephone, delete this email and destroy any copies. Thank you.
>
-- 
Nitay Kufert
Backend Team Leader
[image: ironSource] <http://www.ironsrc.com/>

email nita...@ironsrc.com
mobile +972-54-5480021
fax +972-77-5448273
Derech Menachem Begin 121, Tel- Aviv
ironsrc.com <http://www.ironsrc.com/>
[image: linkedin] <https://www.linkedin.com/company/ironsource>[image:
twitter] <https://twitter.com/ironsource>[image: facebook]
<https://www.facebook.com/ironSource>[image: googleplus]
<https://plus.google.com/+ironsrc>
This email (including any attachments) is for the sole use of the intended
recipient and may contain confidential information which may be protected
by legal privilege. If you are not the intended recipient, or the employee
or agent responsible for delivering it to the intended recipient, you are
hereby notified that any use, dissemination, distribution or copying of
this communication and/or its content is strictly prohibited. If you are
not the intended recipient, please immediately notify us by reply email or
by telephone, delete this email and destroy any copies. Thank you.

Reply via email to