It looks like the problem is that the parse function non-serializeable. This is most likely because the formats variable is local to the ParseJson object, and therefore not globally accessible to the cluster. Generally this problem can be solved by moving the variable inside the closure so that it is distributed to each worker.
In this specific instance, it makes far more sense to use the json datasource <http://spark.apache.org/docs/latest/sql-programming-guide.html#json-datasets> provided by newer versions of Spark. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Task-not-serializable-java-io-NotSerializableException-org-json4s-Serialization-anon-1-tp8233p27359.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org