That's a good point about polyglot. Given that Spark is incorporating a
range of languages (Scala, Java, Py, R, SQL) it becomes a trade-off whether
or not to centralize support or integrate with native options. Going with
the latter implies more standardization and less tech debt.

The big win with PMML however is migration, e.g., regulated industries may
have a strong requirement to train in one place that is auditable (e.g.,
SAS) but then score at scale (e.g., Spark). Migration in the opposite
direction is also much in demand, e.g., to leverage training at scale
through Spark.

It's worth noting that there is a PMML community. Open Data Group
(Augustus) and Zementis do much work to help organize and promote that.
Opinion: both of those projects seem more likely as best ref impls than
JPMML -- at least more actively cooperating within the PMML open standard
community. YMMV.

If you're interested in PMML then I'd encourage you to get involved. There
are workshops, e.g., generally at KDD, ACM gatherings, etc.

FWIW, I was the original lead on Cascading's PMML support -- first rev that
other firms used in production, not the rewrite on Concurrent's site that
added Cascading deep dependencies.



On Tue, Jun 10, 2014 at 11:10 AM, Evan R. Sparks <evan.spa...@gmail.com>
wrote:

> I should point out that if you don't want to take a polyglot approach to
> languages and reside solely in the JVM, then you can just use plain old
> java serialization on the Model objects that come out of MLlib's APIs from
> Java or Scala and load them up in another process and call the relevant
> .predict() method when it comes time to serve. The same approach would
> probably also work for models trained via MLlib's python APIs, but I
> haven't tried that.
>
> Native PMML serialization would be a nice feature to add to MLlib as a
> mechanism to transfer models to other environments for further
> analysis/serving. There's a JIRA discussion about this here:
> https://issues.apache.org/jira/browse/SPARK-1406
>
>
> On Tue, Jun 10, 2014 at 10:53 AM, filipus <floe...@gmail.com> wrote:
>
>> Thank you very much
>>
>> the cascading project i didn't recognize it at all till now
>>
>> this project is very interesting
>>
>> also I got the idea of the usage of scala as a language for spark -
>> becuase
>> i can intergrate jvm based libraries very easy/naturaly when I got it
>> right
>>
>> mh... but I could also use sparc as a model engine, augustus for the
>> serializer and a third party produkt for the prediction engine like using
>> jpmml
>>
>> mh... got the feeling that i need to do java, scala and python at the same
>> time...
>>
>> first things first -> augustus for an pmml output from spark :-)
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/pmml-with-augustus-tp7313p7335.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>
>

Reply via email to