the pseudo code :
object myApp {
var myStaticRDD: RDD[Int]
def main() {
... //init streaming context, and get two DStream (streamA and streamB)
from two hdfs path
//complex transformation using the two DStream
val new_stream = streamA.transformWith(StreamB, (a, b, t) => {
a.join(b).map(...)
}
)
//join the new_stream's rdd with myStaticRDD
new_stream.foreachRDD(rdd =>
myStaticRDD = myStaticRDD.join(cur_stream)
)
// do complex model training every two hours.
if (hour is 0, 2, 4, 6...) {
model_training(myStaticRDD) //will take 1 hour
}
}
}
I don't know how to write the code to realize training model every two hours
using that moment's myStaticRDD.
And when the model-training is running, the streaming task could also run
normally simultaneously...
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/union-eatch-streaming-window-into-a-static-rdd-and-use-the-static-rdd-periodicity-tp22783.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]