Hi Guys, This is my first mail to spark users mailing list.
I need help on Dstream operation. In fact, I am using a MLlib randomforest model to predict using spark streaming. In the end, I want to combine the feature Dstream & prediction Dstream together for further downstream processing. I am predicting using below piece of code: predictions = texts.map( lambda x : getFeatures(x) ).map(lambda x : x.split(',')).map( lambda parts : [float(i) for i in parts] ).transform(lambda rdd: rf_model.predict(rdd)) Here texts is dstream having single line of text as records getFeatures generates a comma separated features extracted from each record I want the output as below tuple: ("predicted value", "original text") How can I achieve that ? or at least can I perform .zip like normal RDD operation on two Dstreams, like below: output = texts.zip(predictions) Thanks in advance. -Obaid