How can I launch a a thread in background on all worker nodes before the data processing actually starts?

ravidspark Thu, 15 Mar 2018 15:23:40 -0700

*Environment:* 
Spark 2.2.0
*Kafka:* 0.10.0
*Language:* Java

*UseCase:* Streaming data from Kafka using JavaDStreams and storing into a
downstream database.


*Issue:* 

I have a use case, where in I have to launch a thread in the background that
would connect to a DB and Cache the retrieved resultset for every 30 mins.
This component individually is working fine. When integrated with Spark
Streaming, my tasks use the data from this MySQL thread because of which I
have to run this thread on all the worker nodes rather than the driver nodes
before the actual data processing starts. Something like a setup() below
should be run on all the worker nodes,

public static void setup() {
                try {
                        new Util(FSFactory.getFSHandle(null));  *--> This class 
has the
implementation for the thread*
                } catch (IOException e) {
                        logger.error("IOException: error to create Util");
                }
        }


*What did I try:* I tried passing the above method as a broadcast variable
in spark. But, from my understanding, broadcast variable is only a read-only
value. So, as I am running a threaded program in the background for the
broadcasted variable, I didn't see anything related to my code in the logs
and the thread did not run.

I have some knowledge on other streaming frameworks, where I can setup any
dependencies in setup() and close the dependencies in the terminate() for
every container. Is there something like that in Spark?

Am I missing any concept here? I googled around, looked on SO but couldn't
find anything useful. Any help would be grateful.

Thanks,
Ravi



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

How can I launch a a thread in background on all worker nodes before the data processing actually starts?

Reply via email to