<<Apologies for the repeat. The first was rejected by the submission process>>
I created a simple Spark streaming program using updateStateByKey. The domain is represented by case classes for clarity, type safety, etc. Spark job continuously loads new classes, which are removed by GC to maintain a relatively constant level of active classes instances. The total memory footprint grows and the throughput slows, until the job fails. The failure is generally triggered when the processing cannot no longer keep up with the rate limited input. The offending classes are of the form "sun.reflect.GeneratedSerializationConstructorAccessor125". This failure does not occur if the job is rewritten to use only Scala classes: tuples and primitives. The following github project contains the test code and more details https://github.com/searler/SparkStreamingLeak <https://github.com/searler/SparkStreamingLeak> -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-updateStateByKey-fails-with-class-leak-when-using-case-classes-resend-tp22793.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org