<<Apologies for the repeat. The first was rejected by the submission
process>>

I created a simple Spark streaming program using updateStateByKey.
The domain is represented by case classes for clarity, type safety, etc.

Spark job continuously loads new classes, which are removed by GC to
maintain
a relatively constant level of active classes instances. The total memory 
footprint grows and the throughput slows, until the job fails. The failure
is generally triggered when
the processing cannot no longer keep up with the rate limited input. 

The offending classes are of the form
"sun.reflect.GeneratedSerializationConstructorAccessor125".

This failure does not occur if the job is rewritten to use only Scala
classes: tuples and primitives.

The following github project contains the test code and more details

https://github.com/searler/SparkStreamingLeak
<https://github.com/searler/SparkStreamingLeak>  




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-updateStateByKey-fails-with-class-leak-when-using-case-classes-resend-tp22793.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to