Hi all,

I'm currently having a load issue with the updatestateBykey function. 
Seems to be running with considerable delay for a few the state objects when
the number increases.

I have a 1 sec batch size receiving events from Kafka stream which creates
state objects and also update then consequently (around 1/10 of the objects
are updated every second).

I was expecting for a 100 state objects DStream to get the update function
to get called every 1 second. That's not happening for all objects. For a
few state objects, seems that the update function is called multiple
consecutive times after a few seconds. For example, if didn't call the
function in the last 10 seconds, it will call it 10 consecutive times. 
looks like it's trying to compensate the fact that it didn't run those
seconds before... 

Until 20 objects run I always get the update function called every second,
so no issues with this amount.

This seems to happen only when I increase the number of state objects. So
seems a load issue but 100 state objects is not that big right?

Anyone had similar experience or knows what is the problem I am hitting?

Thanks very much in advance.

Rod


  





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/UpdatestateByKey-assumptions-tp10858.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to