after upgrade from Spark 1.5 to 1.6(CDH 5.6.0 -> 5.7.1)
some of our streaming job getting delay after long run.

with a little invesgation, here is what i found.
    - the same program have no problem with Spark 1.5
- we have two kind of streaming and only those with "updateStateByKey" was affected, - cpu usage getting higher and higher over time ( with 1core@5% at start and 1core@100% after a week ) - data rate is alound 100 event/s, there is no chance for the cpu to work so hard.
    - process time for a batch delay from 100ms at start to 3s after a week
- evening running the same program(for difference input data), not all process delay with the same scale - no warning or error message until it delay too much and went out of memory
    - process time of customer code seems have no problem
    - memory/heap usage looks normal to me

Im suspecting the problem is comming from updateStateByKey but i cant trace it down

any one experience the same problem?


--
BR
Peter Chan

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to