[GitHub] spark issue #21172: [SPARK-23120][PYSPARK][ML] Add basic PMML export support...

2018-06-29 Thread vruusmann
Github user vruusmann commented on the issue: https://github.com/apache/spark/pull/21172 Here's a pointer to another PySpark-to-PMML conversion tool: https://github.com/jpmml/pyspark2pmml --- - To unsubscribe, e

[GitHub] spark issue #18584: [SPARK-15526][MLLIB] Shade JPMML

2017-07-10 Thread vruusmann
Github user vruusmann commented on the issue: https://github.com/apache/spark/pull/18584 Good to know that there will be some relief coming in Apache Spark 2.3.X. I don't think that the shading will break any Spark application that depends on the `PMMLExportable` trait

[GitHub] spark issue #3062: [SPARK-1406] Mllib pmml model export

2016-09-22 Thread vruusmann
Github user vruusmann commented on the issue: https://github.com/apache/spark/pull/3062 @manugarri You can export fitted pipeline models to PMML using the [JPMML-SparkML-Package](https://github.com/jpmml/jpmml-sparkml-package) Apache Spark Package. There's a worked-out PySpark

[GitHub] spark pull request: [SPARK-15523][ML][MLLIB] Update JPMML to 1.2.1...

2016-05-25 Thread vruusmann
Github user vruusmann closed the pull request at: https://github.com/apache/spark/pull/13293 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-15523][ML][MLLIB] Update JPMML to 1.2.1...

2016-05-25 Thread vruusmann
GitHub user vruusmann opened a pull request: https://github.com/apache/spark/pull/13297 [SPARK-15523][ML][MLLIB] Update JPMML to 1.2.15 ## What changes were proposed in this pull request? See https://issues.apache.org/jira/browse/SPARK-15523 This PR replaces PR

[GitHub] spark pull request: [SPARK-15523][ML][MLLIB] Update JPMML to 1.2.1...

2016-05-25 Thread vruusmann
Github user vruusmann commented on the pull request: https://github.com/apache/spark/pull/13293#issuecomment-221600087 I'll go and read Apache Spark docs about PR guidelines. Looks like I missed this deps thing. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-15523][ML][MLLIB] Update JPMML to 1.2.1...

2016-05-25 Thread vruusmann
GitHub user vruusmann opened a pull request: https://github.com/apache/spark/pull/13293 [SPARK-15523][ML][MLLIB] Update JPMML to 1.2.15 ## What changes were proposed in this pull request? See https://issues.apache.org/jira/browse/SPARK-15523 ## How was this patch

[GitHub] spark pull request: [SPARK-11171][SPARK-11237][SPARK-11241][ML] Tr...

2016-05-10 Thread vruusmann
Github user vruusmann commented on the pull request: https://github.com/apache/spark/pull/9207#issuecomment-218312022 @holdenk The JPMML-SparkML library depends on AGPLv3-licensed libraries, which doesn't leave much choice. I've just published the refactored feature mapping

[GitHub] spark pull request: [SPARK-11171][SPARK-11237][SPARK-11241][ML] Tr...

2016-05-05 Thread vruusmann
Github user vruusmann commented on the pull request: https://github.com/apache/spark/pull/9207#issuecomment-217136218 A thought about designing an interface for exporting ML solutions (exemplified using PMML, but should be generalizable to other data formats as well). Namely

[GitHub] spark pull request: [SPARK-11171][SPARK-11237][SPARK-11241][ML] Tr...

2016-05-05 Thread vruusmann
Github user vruusmann commented on the pull request: https://github.com/apache/spark/pull/9207#issuecomment-217120235 Ping @srowen @mengxr @holdenk The first version of Spark ML pipelines to PMML converter is now available at: https://github.com/jpmml/jpmml-sparkml However

[GitHub] spark pull request: [SPARK-11171][SPARK-11237][SPARK-11241][ML] Tr...

2016-04-28 Thread vruusmann
Github user vruusmann commented on the pull request: https://github.com/apache/spark/pull/9207#issuecomment-215551480 The main difference between PMML and PFA is the abstraction level. PMML is a high-level language (more similar to modeling languages such as UML), where you're

[GitHub] spark pull request: [SPARK-11171][SPARK-11237][SPARK-11241][ML] Tr...

2016-04-28 Thread vruusmann
Github user vruusmann commented on the pull request: https://github.com/apache/spark/pull/9207#issuecomment-215404038 I've been experimenting with a standalone Spark ML Pipelines to PMML converter in recent days. The goal is to cover basic transformers (eg. `StringIndexer

[GitHub] spark pull request: [SPARK-11988] [ML] [MLLIB] Update JPMML to 1.2...

2015-11-25 Thread vruusmann
Github user vruusmann commented on the pull request: https://github.com/apache/spark/pull/9972#issuecomment-159623244 The `jpmml-model` is the top-level (ie. project) artifact. It provides "cover" for a list of modules, such as `pmml-model`, `pmml-schema`, `pmml-

[GitHub] spark pull request: [SPARK-11988] [ML] [MLLIB] Update JPMML to 1.2...

2015-11-25 Thread vruusmann
Github user vruusmann commented on a diff in the pull request: https://github.com/apache/spark/pull/9972#discussion_r45872359 --- Diff: mllib/pom.xml --- @@ -109,7 +109,7 @@ org.jpmml pmml-model - 1.1.15 + 1.2.7 --- End diff

[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

2015-10-31 Thread vruusmann
Github user vruusmann commented on the pull request: https://github.com/apache/spark/pull/9057#issuecomment-152766778 You may want to check out some valid NaiveBayes models. For example, see the following NB model for the popular "Audit" dataset: https://github.com/j

[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

2015-10-31 Thread vruusmann
Github user vruusmann commented on the pull request: https://github.com/apache/spark/pull/9057#issuecomment-15273 The value of the `TargetValueCount@value` attribute must equal some **valid** value of the target `DataField` element (as defined by `DataField/Value@value` attribute

[GitHub] spark pull request: [SPARK-1406] Mllib pmml model export

2015-04-22 Thread vruusmann
Github user vruusmann commented on the pull request: https://github.com/apache/spark/pull/3062#issuecomment-95080549 @selvinsource @mengxr I created an small project https://github.com/vruusmann/jpmml-test in order to demonstrate how it's possible to reduce the number of dependencies

[GitHub] spark pull request: [SPARK-1406] Mllib pmml model export

2015-04-21 Thread vruusmann
Github user vruusmann commented on the pull request: https://github.com/apache/spark/pull/3062#issuecomment-94661002 @mengxr Branches 1.0.X and 1.1.X are Java 6. The new branch 1.2.X is Java 7. The latest Java 6 compatible version is 1.1.15. --- If your project is set up

[GitHub] spark pull request: [SPARK-1406] Mllib pmml model export

2015-04-21 Thread vruusmann
Github user vruusmann commented on the pull request: https://github.com/apache/spark/pull/3062#issuecomment-94781590 @selvinsource The benefits of upgrading the `pmml-model` dependency are more obvious if you are in the business of consuming PMML documents (eg. speed and memory usage

[GitHub] spark pull request: [WIP][SPARK-3530][MLLIB] pipeline and paramete...

2014-11-06 Thread vruusmann
Github user vruusmann commented on the pull request: https://github.com/apache/spark/pull/3099#issuecomment-61949888 @jegonzal PMML is essentially a domain-specific language (DSL) for the domain of predictive analytic applications. It is commonly used only for the representation