Hi all, I'm currently having a load issue with the updatestateBykey function. Seems to be running with considerable delay for a few the state objects when the number increases.
I have a 1 sec batch size receiving events from Kafka stream which creates state objects and also update then consequently (around 1/10 of the objects are updated every second). I was expecting for a 100 state objects DStream to get the update function to get called every 1 second. That's not happening for all objects. For a few state objects, seems that the update function is called multiple consecutive times after a few seconds. For example, if didn't call the function in the last 10 seconds, it will call it 10 consecutive times. looks like it's trying to compensate the fact that it didn't run those seconds before... Until 20 objects run I always get the update function called every second, so no issues with this amount. This seems to happen only when I increase the number of state objects. So seems a load issue but 100 state objects is not that big right? Anyone had similar experience or knows what is the problem I am hitting? Thanks very much in advance. Rod -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/UpdatestateByKey-assumptions-tp10858.html Sent from the Apache Spark User List mailing list archive at Nabble.com.