I agree 100%. Making the model requires large data and many cpus.
Using it does not.
This is a very useful side effect of ML models.
If mlib can't use models outside spark that's a real shame.

Sent from my Verizon Wireless 4G LTE smartphone

-------- Original message --------
From: "Kothuvatiparambil, Viju" <viju.kothuvatiparam...@bankofamerica.com> 
Date: 11/12/2015  3:09 PM  (GMT-05:00) 
To: DB Tsai <dbt...@dbtsai.com>, Sean Owen <so...@cloudera.com> 
Cc: Felix Cheung <felixcheun...@hotmail.com>, Nirmal Fernando 
<nir...@wso2.com>, Andy Davidson <a...@santacruzintegration.com>, Adrian Tanase 
<atan...@adobe.com>, "user @spark" <user@spark.apache.org>, Xiangrui Meng 
<men...@gmail.com>, hol...@pigscanfly.ca 
Subject: RE: thought experiment: use spark ML to real time prediction 





<!--
/* Font Definitions */
@font-face
        {font-family:Courier;
        panose-1:2 7 4 9 2 2 5 2 4 4;}
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0in;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman",serif;}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
span.EmailStyle17
        {mso-style-type:personal-reply;
        font-family:"Calibri",sans-serif;
        color:#1F497D;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-family:"Calibri",sans-serif;}
@page WordSection1
        {size:8.5in 11.0in;
        margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
        {page:WordSection1;}
-->



I am glad to see DB’s comments, make me feel I am not the only one facing these 
issues. If we are able to use MLLib to load the model in web applications 
(outside
 the spark cluster), that would have solved the issue.  I understand Spark is 
manly for processing big data in a distributed mode. But, there is no purpose 
in training a model using MLLib, if we are not able to use it in applications 
where needs to access the
 model.  
 
Thanks
Viju
 
From: DB Tsai [mailto:dbt...@dbtsai.com]


Sent: Thursday, November 12, 2015 11:04 AM

To: Sean Owen

Cc: Felix Cheung; Nirmal Fernando; Andy Davidson; Adrian Tanase; user @spark; 
Xiangrui Meng; hol...@pigscanfly.ca

Subject: Re: thought experiment: use spark ML to real time prediction
 

I think the use-case can be quick different from PMML. 

 


By having a Spark platform independent ML jar, this can empower users to do the 
following,


 


1) PMML doesn't contain all the models we have in mllib. Also, for a ML 
pipeline trained by Spark, most of time, PMML is not expressive enough to do 
all the transformation we have in Spark ML. As a result, if we are able to 
serialize the
 entire Spark ML pipeline after training, and then load them back in app 
without any Spark platform for production scorning, this will be very useful 
for production deployment of Spark ML models. The only issue will be if the 
transformer involves with shuffle,
 we need to figure out a way to handle it. When I chatted with Xiangrui about 
this, he suggested that we may tag if a transformer is shuffle ready. 
Currently, at Netflix, we are not able to use ML pipeline because of those 
issues, and we have to write our own
 scorers in our production which is quite a duplicated work.


 


2) If users can use Spark's linear algebra like vector or matrix code in their 
application, this will be very useful. This can help to share code in Spark 
training pipeline and production deployment. Also, lots of good stuff at Spark's
 mllib doesn't depend on Spark platform, and people can use them in their 
application without pulling lots of dependencies. In fact, in my project, I 
have to copy & paste code from mllib into my project to use those goodies in 
apps.


 


3) Currently, mllib depends on graphx which means in graphx, there is no way to 
use mllib's vector or matrix. And

Reply via email to