Yes, learning on a dedicated Spark cluster and predicting inside a Storm
bolt is quite OK :)
Thanks all for your answers.
I'll post back if/when we experience this solution.
E/
2014-06-19 20:45 GMT+02:00 Shuo Xiang :
> If I'm understanding correctly, you want to use MLlib for offline trainin
If I'm understanding correctly, you want to use MLlib for offline training
and then deploy the learned model to Storm? In this case I don't think
there is any problem. However if you are looking for online model
update/training, this can be complicated and I guess quite a few algorithms
in mllib at
You should be able to use many of the MLlib Model objects directly in Storm, if
you save them out using Java serialization. The only one that won’t work is
probably ALS, because it’s a distributed model.
Otherwise, you will have to output them in your own format and write code for
evaluating th
I can't speak for MLlib, too. But I can say the model of training in Hadoop
M/R or Spark and production scoring in Storm works very well. My team has
done online learning (Sofia ML library, I think) in Storm as well.
I would be interested in this answer as well.
-Suren
On Thu, Jun 19, 2014 at
Well, yes VW is an appealing option but I only found "experimental"
integrations so far.
Also, early experiments suggest Decision Trees Ensembles (RF, GBT) perform
better than generalized linear models on our data. Hence the interest for
MLLib :)
Any other comments / suggestions welcome :)
E/
While I can't definitively speak to MLLib online learning,
I'm sure you're evaluating Vowpal Wabbit, for which there's been some storm
integrations contributed.
Also you might look at factorie, http://factorie.cs.understanding.edu,
which at least provides an online lda.
C
On Thursday, June 19, 20
Hi Sparkers,
We have a Storm cluster and looking for a decent execution engine for
machine learned models. What I've seen from MLLib is extremely positive,
but we can't just throw away our Storm based stack.
So my question is: is it feasible/recommended to train models in
Spark/MLLib and execute