Spark Streaming: Combine MLlib Prediction and Features on Dstreams

2016-05-31 Thread obaidul karim
Hi nguyen, Thanks again. Yes, faltMap may do the trick as well. I may try it out. I will let you know the result when done. On Tue, May 31, 2016 at 3:58 PM, nguyen duc tuan > wrote: > 1. RandomForest 'predict' method supports both RDD or Vector as input ( > http://spark.apache.org/docs/lates

Re: Spark Streaming: Combine MLlib Prediction and Features on Dstreams

2016-05-31 Thread nguyen duc tuan
1. RandomForest 'predict' method supports both RDD or Vector as input ( http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.tree.model.RandomForestModel) . So, in this case, function extract_feature should return tuple.(prediction, rawtext). If each input text can create

Re: Spark Streaming: Combine MLlib Prediction and Features on Dstreams

2016-05-31 Thread obaidul karim
Hi nguyen, Thanks a lot for your time and really appreciate good suggestions. Please find my concerns in line below: def extract_feature(rf_model, x): text = getFeatures(x).split(',') fea = [float(i) for i in text] prediction = rf_model.predict(fea) return (prediction, x) <<< this will return tw

Re: Spark Streaming: Combine MLlib Prediction and Features on Dstreams

2016-05-30 Thread nguyen duc tuan
I'm not sure what do you mean by saying "does not return any value". How do you use this method? I will use this method as following : def extract_feature(rf_model, x): text = getFeatures(x).split(',') fea = [float(i) for i in text] prediction = rf_model.predict(fea) return (prediction, x) def pro

Re: Spark Streaming: Combine MLlib Prediction and Features on Dstreams

2016-05-30 Thread obaidul karim
Sorry for lots of typos (writing from mobile) On Tuesday, 31 May 2016, obaidul karim wrote: > foreachRDD does not return any value. I can be used just to send result to > another place/context, like db,file etc. > I could use that but seems like over head of having another hop. > I wanted to mak

Re: Spark Streaming: Combine MLlib Prediction and Features on Dstreams

2016-05-30 Thread obaidul karim
foreachRDD does not return any value. I can be used just to send result to another place/context, like db,file etc. I could use that but seems like over head of having another hop. I wanted to make it simple and light. On Tuesday, 31 May 2016, nguyen duc tuan wrote: > How about using foreachRDD

Re: Spark Streaming: Combine MLlib Prediction and Features on Dstreams

2016-05-30 Thread nguyen duc tuan
How about using foreachRDD ? I think this is much better than your trick. 2016-05-31 12:32 GMT+07:00 obaidul karim : > Hi Guys, > > In the end, I am using below. > The trick is using "native python map" along with "spark spreaming > transform". > May not an elegent way, however it works :). > >

Re: Spark Streaming: Combine MLlib Prediction and Features on Dstreams

2016-05-30 Thread obaidul karim
Hi Guys, In the end, I am using below. The trick is using "native python map" along with "spark spreaming transform". May not an elegent way, however it works :). def predictScore(texts, modelRF): predictions = texts.map( lambda txt : (txt , getFeatures(txt)) ).\ map(lambda (txt, featur

Re: Spark Streaming: Combine MLlib Prediction and Features on Dstreams

2016-05-30 Thread nguyen duc tuan
How about this ? def extract_feature(rf_model, x): text = getFeatures(x).split(',') fea = [float(i) for i in text] prediction = rf_model.predict(fea) return (prediction, x) output = texts.map(lambda x: extract_feature(rf_model, x)) 2016-05-30 14:17 GMT+07:00 obaidul karim : > Hi, > > Anybody has

Re: Spark Streaming: Combine MLlib Prediction and Features on Dstreams

2016-05-30 Thread obaidul karim
Hi, Anybody has any idea on below? -Obaid On Friday, 27 May 2016, obaidul karim wrote: > Hi Guys, > > This is my first mail to spark users mailing list. > > I need help on Dstream operation. > > In fact, I am using a MLlib randomforest model to predict using spark > streaming. In the end, I wa

Spark Streaming: Combine MLlib Prediction and Features on Dstreams

2016-05-26 Thread obaidul karim
Hi Guys, This is my first mail to spark users mailing list. I need help on Dstream operation. In fact, I am using a MLlib randomforest model to predict using spark streaming. In the end, I want to combine the feature Dstream & prediction Dstream together for further downstream processing. I am