It depends on how you want to run your application. You can always save 100
batch as a data file and run another app to read those files. In that case
you have separated contexts and you will find both application running
simultaneously in the cluster but on different JVMs. But if you do not want
to use separate process you can use the same context and then training
tasks will run on same JVM as the streaming.
Basically in first option you are using batch and real time pipeline of
lambda architecture whereas in second option you are doing everything in
real time pipeline.

Best
Ayan
On 12 May 2015 00:08, "hotdog" <lisend...@163.com> wrote:

> I want to start a child-thread in foreachRDD.
>
> My situation is:
>
> the job is reading from a hdfs dir continuously, and every 100 batches, I
> want to launch a model training task (I will make a snapshot of the rdds at
> that time and start the training task. the training task takes a very long
> time(2 hours), and I don't want the training task influence reading new
> batch of data.
>
> Is starting a new child thread a good solution? Could the child thread use
> SparkContext in the main thread and use the rdd in main thread?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/can-we-start-a-new-thread-in-foreachRDD-in-spark-streaming-tp22845.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to