Thanks for letting me know about this, it looks pretty interesting. From reading the documentation it seems that the server must be built on a Spark cluster, is that correct? Is it possible to deploy it in on a Java server? That is how we are currently running our web app.
On Tue, Nov 4, 2014 at 7:57 PM, Simon Chan <simonc...@gmail.com> wrote: > The latest version of PredictionIO, which is now under Apache 2 license, > supports the deployment of MLlib models on production. > > The "engine" you build will including a few components, such as: > - Data - includes Data Source and Data Preparator > - Algorithm(s) > - Serving > I believe that you can do the feature vector creation inside the Data > Preparator component. > > Currently, the package comes with two templates: 1) Collaborative > Filtering Engine Template - with MLlib ALS; 2) Classification Engine > Template - with MLlib Naive Bayes. The latter one may be useful to you. And > you can customize the Algorithm component, too. > > I have just created a doc: http://docs.prediction.io/0.8.1/templates/ > Love to hear your feedback! > > Regards, > Simon > > > > On Mon, Oct 27, 2014 at 11:03 AM, chirag lakhani <chirag.lakh...@gmail.com > > wrote: > >> Would pipelining include model export? I didn't see that in the >> documentation. >> >> Are there ways that this is being done currently? >> >> >> >> On Mon, Oct 27, 2014 at 12:39 PM, Xiangrui Meng <men...@gmail.com> wrote: >> >>> We are working on the pipeline features, which would make this >>> procedure much easier in MLlib. This is still a WIP and the main JIRA >>> is at: >>> >>> https://issues.apache.org/jira/browse/SPARK-1856 >>> >>> Best, >>> Xiangrui >>> >>> On Mon, Oct 27, 2014 at 8:56 AM, chirag lakhani >>> <chirag.lakh...@gmail.com> wrote: >>> > Hello, >>> > >>> > I have been prototyping a text classification model that my company >>> would >>> > like to eventually put into production. Our technology stack is >>> currently >>> > Java based but we would like to be able to build our models in >>> Spark/MLlib >>> > and then export something like a PMML file which can be used for model >>> > scoring in real-time. >>> > >>> > I have been using scikit learn where I am able to take the training >>> data >>> > convert the text data into a sparse data format and then take the other >>> > features and use the dictionary vectorizer to do one-hot encoding for >>> the >>> > other categorical variables. All of those things seem to be possible >>> in >>> > mllib but I am still puzzled about how that can be packaged in such a >>> way >>> > that the incoming data can be first made into feature vectors and then >>> > evaluated as well. >>> > >>> > Are there any best practices for this type of thing in Spark? I hope >>> this >>> > is clear but if there are any confusions then please let me know. >>> > >>> > Thanks, >>> > >>> > Chirag >>> >> >> >