Hi,

I am seeing a lot of posts on singletons vs. broadcast variables, such as
*
http://apache-spark-user-list.1001560.n3.nabble.com/Best-way-to-have-some-singleton-per-worker-tt20277.html
*
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-share-a-NonSerializable-variable-among-tasks-in-the-same-worker-node-tt11048.html#a21219

What's the best approach to instantiate an object once and have it be reused
by the worker(s).

E.g. I have an object that loads some static state such as e.g. a
dictionary/map, is a part of 3rd party API and is not serializable.  I can't
seem to get it to be a singleton on the worker side as the JVM appears to be
wiped on every request so I get a new instance.  So the singleton doesn't
stick.

Is there an approach where I could have this object or a wrapper of it be a
broadcast var? Can Kryo get me there? would that basically mean writing a
custom serializer?  However, the 3rd party object may have a bunch of member
vars hanging off it, so serializing it properly may be non-trivial...

Any pointers/hints greatly appreciated.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Best-practice-for-using-singletons-on-workers-seems-unanswered-tp23692.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to