[GitHub] spark pull request #22139: [SPARK-25149][GraphX] Update Parallel Personalize...

2018-08-17 Thread MrBago
GitHub user MrBago opened a pull request: https://github.com/apache/spark/pull/22139 [SPARK-25149][GraphX] Update Parallel Personalized Page Rank to test with large vertexIds ## What changes were proposed in this pull request? runParallelPersonalizedPageRank in graphx

[GitHub] spark pull request #21799: [SPARK-24747][ML] Update spark.ml to use Instrume...

2018-07-17 Thread MrBago
GitHub user MrBago opened a pull request: https://github.com/apache/spark/pull/21799 [SPARK-24747][ML] Update spark.ml to use Instrumentation.instrumented. ## What changes were proposed in this pull request? Update spark.ml training code to fully wrap instrumented methods

[GitHub] spark pull request #21719: [SPARK-24747][ML] Make Instrumentation class more...

2018-07-17 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/21719#discussion_r203126526 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/Instrumentation.scala --- @@ -19,45 +19,60 @@ package org.apache.spark.ml.util import

[GitHub] spark pull request #21719: [SPARK-24747] Make Instrumentation class more fle...

2018-07-05 Thread MrBago
GitHub user MrBago opened a pull request: https://github.com/apache/spark/pull/21719 [SPARK-24747] Make Instrumentation class more flexible ## What changes were proposed in this pull request? This PR updates the Instrumentation class to make it more flexible and a little

[GitHub] spark issue #21344: [SPARK-24114] Add instrumentation to FPGrowth.

2018-05-16 Thread MrBago
Github user MrBago commented on the issue: https://github.com/apache/spark/pull/21344 @ludatabricks thanks for taking a loo, I added `logParams`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #21344: [SPARK-24114] Add instrumentation to FPGrowth.

2018-05-16 Thread MrBago
GitHub user MrBago opened a pull request: https://github.com/apache/spark/pull/21344 [SPARK-24114] Add instrumentation to FPGrowth. ## What changes were proposed in this pull request? Have FPGrowth keep track of model training using the Instrumentation class

[GitHub] spark pull request #21340: [SPARK-24115] Have logging pass through instrumen...

2018-05-15 Thread MrBago
GitHub user MrBago opened a pull request: https://github.com/apache/spark/pull/21340 [SPARK-24115] Have logging pass through instrumentation class. ## What changes were proposed in this pull request? Fixes to tuning instrumentation. ## How was this patch tested

[GitHub] spark issue #21195: [Spark-23975][ML] Add support of array input for all clu...

2018-05-02 Thread MrBago
Github user MrBago commented on the issue: https://github.com/apache/spark/pull/21195 Thanks Lu! I had a pass over this PR and it looks pretty straightforward. One thing I noticed is that there are two patterns that we keep repeating. I think we should add private APIs for

[GitHub] spark issue #21195: [Spark-23975][ML] Add support of array input for all clu...

2018-04-30 Thread MrBago
Github user MrBago commented on the issue: https://github.com/apache/spark/pull/21195 Looking now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #20695: [SPARK-21741][ML][PySpark] Python API for DataFra...

2018-03-20 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/20695#discussion_r175971741 --- Diff: python/pyspark/ml/stat.py --- @@ -132,6 +134,172 @@ def corr(dataset, column, method="pearson"): return _java2py(sc, javaCo

[GitHub] spark pull request #20837: [SPARK-23686][ML][WIP] Better instrumentation

2018-03-19 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/20837#discussion_r175564965 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -517,6 +517,9 @@ class LogisticRegression @Since("

[GitHub] spark pull request #20837: [SPARK-23686][ML][WIP] Better instrumentation

2018-03-19 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/20837#discussion_r175563532 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -517,6 +517,9 @@ class LogisticRegression @Since("

[GitHub] spark pull request #20837: [SPARK-23686][ML][WIP] Better instrumentation

2018-03-15 Thread MrBago
GitHub user MrBago opened a pull request: https://github.com/apache/spark/pull/20837 [SPARK-23686][ML][WIP] Better instrumentation ## What changes were proposed in this pull request? This PR is meant to show how we could better utilize the Instrumentation class in spark.ml

[GitHub] spark issue #19108: [SPARK-21898][ML] Feature parity for KolmogorovSmirnovTe...

2018-03-14 Thread MrBago
Github user MrBago commented on the issue: https://github.com/apache/spark/pull/19108 Thanks for the changes Weichen, this lgtm. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #19108: [SPARK-21898][ML] Feature parity for KolmogorovSm...

2018-03-01 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/19108#discussion_r171653583 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/KolmogorovSmirnovTest.scala --- @@ -0,0 +1,103 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19108: [SPARK-21898][ML] Feature parity for KolmogorovSm...

2018-03-01 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/19108#discussion_r171653534 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/KolmogorovSmirnovTest.scala --- @@ -0,0 +1,103 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19108: [SPARK-21898][ML] Feature parity for KolmogorovSm...

2018-02-28 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/19108#discussion_r171438289 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/KolmogorovSmirnovTest.scala --- @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19108: [SPARK-21898][ML] Feature parity for KolmogorovSm...

2018-02-28 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/19108#discussion_r171438156 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/KolmogorovSmirnovTest.scala --- @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19108: [SPARK-21898][ML] Feature parity for KolmogorovSm...

2018-02-28 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/19108#discussion_r171433430 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/KolmogorovSmirnovTest.scala --- @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19108: [SPARK-21898][ML] Feature parity for KolmogorovSm...

2018-02-28 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/19108#discussion_r171423827 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/KolmogorovSmirnovTest.scala --- @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #20566: [SPARK-23377][ML] Fixes Bucketizer with multiple columns...

2018-02-10 Thread MrBago
Github user MrBago commented on the issue: https://github.com/apache/spark/pull/20566 I believe this will break persistence for LogisticRegression. I believe the issue is that the `threshold` param on LogisticRegressionModel doesn't get a default directly, but only gets it durin

[GitHub] spark pull request #20285: [SPARK-22735][ML][DOC] Added VectorSizeHint docs ...

2018-01-23 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/20285#discussion_r163390328 --- Diff: docs/ml-features.md --- @@ -1283,6 +1283,56 @@ for more details on the API. +## VectorSizeHint + +It can sometimes be

[GitHub] spark pull request #20285: [SPARK-22735][ML][DOC] Added VectorSizeHint docs ...

2018-01-23 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/20285#discussion_r163389908 --- Diff: docs/ml-features.md --- @@ -1283,6 +1283,56 @@ for more details on the API. +## VectorSizeHint + +It can sometimes be

[GitHub] spark pull request #20285: [SPARK-22735][ML][DOC] Added VectorSizeHint docs ...

2018-01-23 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/20285#discussion_r163341372 --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaVectorSizeHintExample.java --- @@ -0,0 +1,79 @@ +/* + * Licensed to the Apache

[GitHub] spark issue #20285: [SPARK-22735][ML][DOC] Added VectorSizeHint docs and exa...

2018-01-22 Thread MrBago
Github user MrBago commented on the issue: https://github.com/apache/spark/pull/20285 I'd like to try and get this patched into 2.3 to make sure our documentation is complete for the 2.3 release. @viirya and @WeichenXu123 would you mind having another look

[GitHub] spark pull request #20285: [SPARK-22735][ML][DOC] Added VectorSizeHint docs ...

2018-01-18 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/20285#discussion_r162519014 --- Diff: docs/ml-features.md --- @@ -1283,6 +1283,56 @@ for more details on the API. +## VectorSizeHint + +It can sometimes be

[GitHub] spark issue #20285: [SPARK-22735][ML][DOC] Added VectorSizeHint docs and exa...

2018-01-17 Thread MrBago
Github user MrBago commented on the issue: https://github.com/apache/spark/pull/20285 Thanks for the review @BryanCutler, I've added a java example & uploaded 2 screenshots. --- - To unsubscribe, e-mail

[GitHub] spark issue #20285: [SPARK-22735][ML][DOC] Added VectorSizeHint docs and exa...

2018-01-17 Thread MrBago
Github user MrBago commented on the issue: https://github.com/apache/spark/pull/20285 https://user-images.githubusercontent.com/223219/35074422-9c192bf6-fba2-11e7-8be1-35279db1df49.png";> --- - To unsubs

[GitHub] spark issue #20285: [SPARK-22735][ML][DOC] Added VectorSizeHint docs and exa...

2018-01-17 Thread MrBago
Github user MrBago commented on the issue: https://github.com/apache/spark/pull/20285 https://user-images.githubusercontent.com/223219/35074406-90b074ea-fba2-11e7-853b-45e3c447fe68.png";> --- - To unsubs

[GitHub] spark pull request #20285: [SPARK-22735][ML][DOC] Added VectorSizeHint docs ...

2018-01-16 Thread MrBago
GitHub user MrBago opened a pull request: https://github.com/apache/spark/pull/20285 [SPARK-22735][ML][DOC] Added VectorSizeHint docs and examples. ## What changes were proposed in this pull request? Added documentation for new transformer. You can merge this pull

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-01-16 Thread MrBago
Github user MrBago commented on the issue: https://github.com/apache/spark/pull/20280 BTW the performance issue is orthogonal to the serialization issue raised in this jira/PR. Maybe we should avoid scope creep in this thread

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-01-16 Thread MrBago
Github user MrBago commented on the issue: https://github.com/apache/spark/pull/20280 I think we should raise an error if `__from_dict__` is set and the user tries to index using a position or a slice. Indexing by field name takes the same code path for Rows that are and are

[GitHub] spark issue #20229: [SPARK-23045][ML][SparkR] Update RFormula to use OneHotE...

2018-01-15 Thread MrBago
Github user MrBago commented on the issue: https://github.com/apache/spark/pull/20229 I've rebased on master, I think that should resolve the the issues @viirya raised. --- - To unsubscribe, e-mail: re

[GitHub] spark pull request #20168: [SPARK-22730][ML] Add ImageSchema support for non...

2018-01-12 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/20168#discussion_r161328577 --- Diff: python/pyspark/ml/tests.py --- @@ -1843,6 +1844,27 @@ def tearDown(self): class ImageReaderTest(SparkSessionTestCase

[GitHub] spark pull request #20229: [SPARK-23045][ML][SparkR] Update RFormula to use ...

2018-01-11 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/20229#discussion_r161153997 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala --- @@ -230,16 +231,17 @@ class RFormula @Since("1.5.0") (@Si

[GitHub] spark pull request #20241: [SPARK-23008][ML][FOLLOW-UP] mark OneHotEncoder p...

2018-01-11 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/20241#discussion_r161116909 --- Diff: python/pyspark/ml/feature.py --- @@ -1577,6 +1577,8 @@ class OneHotEncoder(JavaTransformer, HasInputCol, HasOutputCol, JavaMLReadable

[GitHub] spark pull request #20238: Have RFormula include VectorSizeHint in pipeline

2018-01-11 Thread MrBago
GitHub user MrBago opened a pull request: https://github.com/apache/spark/pull/20238 Have RFormula include VectorSizeHint in pipeline ## What changes were proposed in this pull request? Including VectorSizeHint in RFormula piplelines will allow them to be applied to

[GitHub] spark pull request #20229: Update RFormula to use VectorSizeHint & OneHotEnc...

2018-01-10 Thread MrBago
GitHub user MrBago opened a pull request: https://github.com/apache/spark/pull/20229 Update RFormula to use VectorSizeHint & OneHotEncoderEstimator. ## What changes were proposed in this pull request? RFormula should use VectorSizeHint & OneHotEncoderEstimato

[GitHub] spark pull request #20168: [SPARK-22730][ML] Add ImageSchema support for non...

2018-01-09 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/20168#discussion_r160502167 --- Diff: python/pyspark/ml/image.py --- @@ -71,9 +88,30 @@ def ocvTypes(self): """ if self._o

[GitHub] spark pull request #20168: SPARK-22730 Add ImageSchema support for non-integ...

2018-01-08 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/20168#discussion_r160293894 --- Diff: python/pyspark/ml/tests.py --- @@ -1843,6 +1844,28 @@ def tearDown(self): class ImageReaderTest(SparkSessionTestCase

[GitHub] spark pull request #20168: SPARK-22730 Add ImageSchema support for non-integ...

2018-01-08 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/20168#discussion_r160293554 --- Diff: python/pyspark/ml/tests.py --- @@ -1843,6 +1844,28 @@ def tearDown(self): class ImageReaderTest(SparkSessionTestCase

[GitHub] spark pull request #20168: SPARK-22730 Add ImageSchema support for non-integ...

2018-01-08 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/20168#discussion_r160300141 --- Diff: python/pyspark/ml/image.py --- @@ -71,9 +88,30 @@ def ocvTypes(self): """ if self._o

[GitHub] spark pull request #20168: SPARK-22730 Add ImageSchema support for non-integ...

2018-01-08 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/20168#discussion_r160294482 --- Diff: python/pyspark/ml/image.py --- @@ -71,9 +88,30 @@ def ocvTypes(self): """ if self._o

[GitHub] spark pull request #20168: SPARK-22730 Add ImageSchema support for non-integ...

2018-01-08 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/20168#discussion_r160264983 --- Diff: python/pyspark/ml/tests.py --- @@ -1843,6 +1844,28 @@ def tearDown(self): class ImageReaderTest(SparkSessionTestCase

[GitHub] spark pull request #20168: SPARK-22730 Add ImageSchema support for non-integ...

2018-01-08 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/20168#discussion_r160294154 --- Diff: python/pyspark/ml/image.py --- @@ -150,29 +194,27 @@ def toImage(self, array, origin=""): "array arg

[GitHub] spark pull request #20168: SPARK-22730 Add ImageSchema support for non-integ...

2018-01-08 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/20168#discussion_r160267784 --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala --- @@ -37,20 +37,51 @@ import org.apache.spark.sql.types._ @Since("

[GitHub] spark pull request #20168: SPARK-22730 Add ImageSchema support for non-integ...

2018-01-08 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/20168#discussion_r160267831 --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala --- @@ -37,20 +37,51 @@ import org.apache.spark.sql.types._ @Since("

[GitHub] spark pull request #20168: SPARK-22730 Add ImageSchema support for non-integ...

2018-01-08 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/20168#discussion_r160267553 --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala --- @@ -143,12 +174,12 @@ object ImageSchema { val height

[GitHub] spark pull request #20168: SPARK-22730 Add ImageSchema support for non-integ...

2018-01-08 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/20168#discussion_r160291075 --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala --- @@ -37,20 +37,51 @@ import org.apache.spark.sql.types._ @Since("

[GitHub] spark pull request #20143: [SPARK-22949][ML] Apply CrossValidator approach t...

2018-01-03 Thread MrBago
GitHub user MrBago opened a pull request: https://github.com/apache/spark/pull/20143 [SPARK-22949][ML] Apply CrossValidator approach to Driver/Distributed memory tradeoff for TrainValidationSplit ## What changes were proposed in this pull request? Avoid holding all models

[GitHub] spark issue #20095: [SPARK-22126][ML] Added fitMultiple method with default ...

2017-12-31 Thread MrBago
Github user MrBago commented on the issue: https://github.com/apache/spark/pull/20095 @jkbradley I pushed changes in response to your comments. I think we should split the `TrainValidationSplit` memory split into another PR, I may have time to work on it tomorrow

[GitHub] spark pull request #20058: [SPARK-22922][ML][PySpark] Pyspark portion of the...

2017-12-29 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/20058#discussion_r159105682 --- Diff: python/pyspark/ml/base.py --- @@ -47,6 +86,28 @@ def _fit(self, dataset): """ raise NotI

[GitHub] spark pull request #20058: [SPARK-22922][ML][PySpark] Pyspark portion of the...

2017-12-29 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/20058#discussion_r159105728 --- Diff: python/pyspark/ml/base.py --- @@ -18,13 +18,40 @@ from abc import ABCMeta, abstractmethod import copy +import threading

[GitHub] spark pull request #20112: [SPARK-22734][ML][PySpark] Added Python API for V...

2017-12-28 Thread MrBago
GitHub user MrBago opened a pull request: https://github.com/apache/spark/pull/20112 [SPARK-22734][ML][PySpark] Added Python API for VectorSizeHint. (Please fill in changes proposed in this fix) Python API for VectorSizeHint Transformer. (Please explain how this

[GitHub] spark pull request #20058: [SPARK-22922][ML][PySpark] Pyspark portion of the...

2017-12-28 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/20058#discussion_r159024163 --- Diff: python/pyspark/ml/base.py --- @@ -47,6 +86,28 @@ def _fit(self, dataset): """ raise NotI

[GitHub] spark pull request #20058: [SPARK-22126][ML][PySpark] Pyspark portion of the...

2017-12-28 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/20058#discussion_r159023958 --- Diff: python/pyspark/ml/base.py --- @@ -47,6 +86,28 @@ def _fit(self, dataset): """ raise NotI

[GitHub] spark pull request #20095: [SPARK-22126][ML] Added fitMultiple method with d...

2017-12-28 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/20095#discussion_r159011656 --- Diff: mllib/src/main/scala/org/apache/spark/ml/Estimator.scala --- @@ -79,7 +82,51 @@ abstract class Estimator[M <: Model[M]] extends PipelineSt

[GitHub] spark pull request #20095: [SPARK-22126][ML] Added fitMultiple method with d...

2017-12-28 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/20095#discussion_r159007817 --- Diff: mllib/src/main/scala/org/apache/spark/ml/Estimator.scala --- @@ -79,7 +82,51 @@ abstract class Estimator[M <: Model[M]] extends PipelineSt

[GitHub] spark pull request #20095: [SPARK-22126][ML] Added fitMultiple method with d...

2017-12-28 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/20095#discussion_r159006471 --- Diff: mllib/src/main/scala/org/apache/spark/ml/Estimator.scala --- @@ -79,7 +82,51 @@ abstract class Estimator[M <: Model[M]] extends PipelineSt

[GitHub] spark pull request #20058: [SPARK-22126][ML][PySpark] Pyspark portion of the...

2017-12-28 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/20058#discussion_r159004397 --- Diff: python/pyspark/ml/base.py --- @@ -47,6 +74,24 @@ def _fit(self, dataset): """ raise NotI

[GitHub] spark pull request #20058: [SPARK-22126][ML][PySpark] Pyspark portion of the...

2017-12-28 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/20058#discussion_r159004318 --- Diff: python/pyspark/ml/base.py --- @@ -18,13 +18,40 @@ from abc import ABCMeta, abstractmethod import copy +import threading

[GitHub] spark issue #19979: [SPARK-22881][ML][TEST] ML regression package testsuite ...

2017-12-27 Thread MrBago
Github user MrBago commented on the issue: https://github.com/apache/spark/pull/19979 @WeichenXu123 it looks like `testTransformer` is a special case of `testTransformerByGlobalCheckFunc`. I think it's cleaner to structure the tests in this way instead of passing around

[GitHub] spark pull request #20095: [SPARK-22126][ML] Added fitMultiple method with d...

2017-12-27 Thread MrBago
GitHub user MrBago opened a pull request: https://github.com/apache/spark/pull/20095 [SPARK-22126][ML] Added fitMultiple method with default implementation …mator. Update TrainValidationSplit & CrossValidator to use fitMultiple method. ## What changes

[GitHub] spark pull request #20058: [SPARK-22126][ML][PySpark] Pyspark portion of the...

2017-12-22 Thread MrBago
GitHub user MrBago opened a pull request: https://github.com/apache/spark/pull/20058 [SPARK-22126][ML][PySpark] Pyspark portion of the fit-multiple API ## What changes were proposed in this pull request? Adding fitMultiple API to `Estimator` with default implementation

[GitHub] spark pull request #19997: [SPARK-22811][pyspark][ml] Fix pyspark.ml.tests f...

2017-12-15 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/19997#discussion_r157325083 --- Diff: python/pyspark/ml/tests.py --- @@ -44,6 +44,7 @@ import numpy as np from numpy import abs, all, arange, array, array_equal, inf, ones

[GitHub] spark pull request #19997: [SPARK-22811][pyspark][ml] Fix pyspark.ml.tests f...

2017-12-15 Thread MrBago
GitHub user MrBago opened a pull request: https://github.com/apache/spark/pull/19997 [SPARK-22811][pyspark][ml] Fix pyspark.ml.tests failure when Hive is not available. ## What changes were proposed in this pull request? pyspark.ml.tests is missing a py4j import. I&#x

[GitHub] spark pull request #19986: [Test][WIP] add failing test.

2017-12-15 Thread MrBago
Github user MrBago closed the pull request at: https://github.com/apache/spark/pull/19986 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19987: [Test][WIP] add passing test.

2017-12-15 Thread MrBago
Github user MrBago closed the pull request at: https://github.com/apache/spark/pull/19987 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19987: [Test][WIP] add passing test.

2017-12-14 Thread MrBago
GitHub user MrBago opened a pull request: https://github.com/apache/spark/pull/19987 [Test][WIP] add passing test. missing import in python/pyspark/ml/tests.py, verifying with CI. You can merge this pull request into a Git repository by running: $ git pull https://github.com

[GitHub] spark pull request #19986: [Test][WIP] add failing test.

2017-12-14 Thread MrBago
GitHub user MrBago opened a pull request: https://github.com/apache/spark/pull/19986 [Test][WIP] add failing test. missing import in python/pyspark/ml/tests.py, verifying with CI. You can merge this pull request into a Git repository by running: $ git pull https://github.com

[GitHub] spark issue #19904: [SPARK-22707][ML] Optimize CrossValidator memory occupat...

2017-12-13 Thread MrBago
Github user MrBago commented on the issue: https://github.com/apache/spark/pull/19904 @BryanCutler Thanks for the response, I think I understand the situation a little better here. I think there is a fundamental tradeoff we cannot avoid. Specifically there is a tradeoff

[GitHub] spark pull request #19904: [SPARK-22707][ML] Optimize CrossValidator memory ...

2017-12-13 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/19904#discussion_r156751569 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala --- @@ -146,25 +147,18 @@ class CrossValidator @Since("1.2.0") (@Si

[GitHub] spark pull request #19746: [SPARK-22346][ML] VectorSizeHint Transformer for ...

2017-12-13 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/19746#discussion_r156750447 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/VectorSizeHintSuite.scala --- @@ -0,0 +1,173 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19904: [SPARK-22707][ML] Optimize CrossValidator memory ...

2017-12-12 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/19904#discussion_r156511049 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala --- @@ -146,25 +147,18 @@ class CrossValidator @Since("1.2.0") (@Si

[GitHub] spark pull request #19746: [SPARK-22346][ML] VectorSizeHint Transformer for ...

2017-12-11 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/19746#discussion_r156216171 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala --- @@ -0,0 +1,151 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19904: [SPARK-22707][ML] Optimize CrossValidator memory ...

2017-12-07 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/19904#discussion_r155693144 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala --- @@ -146,31 +146,34 @@ class CrossValidator @Since("1.2.0") (@Si

[GitHub] spark pull request #19746: [SPARK-22346][ML] VectorSizeHint Transformer for ...

2017-12-04 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/19746#discussion_r154815581 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/VectorSizeHintSuite.scala --- @@ -0,0 +1,173 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19843: [SPARK-22644][ML][TEST][WIP] Make ML testsuite su...

2017-11-29 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/19843#discussion_r153964637 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala --- @@ -133,6 +133,9 @@ trait StreamTest extends QueryTest with

[GitHub] spark pull request #19843: [SPARK-22644][ML][TEST][WIP] Make ML testsuite su...

2017-11-29 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/19843#discussion_r153964786 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala --- @@ -233,7 +232,8 @@ class LinearRegressionSuite

[GitHub] spark pull request #19843: [SPARK-22644][ML][TEST][WIP] Make ML testsuite su...

2017-11-29 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/19843#discussion_r153964308 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala --- @@ -573,8 +578,19 @@ trait StreamTest extends QueryTest with

[GitHub] spark pull request #19588: [SPARK-12375][ML] VectorIndexerModel support hand...

2017-11-22 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/19588#discussion_r152701758 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala --- @@ -311,22 +342,39 @@ class VectorIndexerModel private[ml

[GitHub] spark issue #19746: [SPARK-22346][ML] VectorSizeHint Transformer for using V...

2017-11-22 Thread MrBago
Github user MrBago commented on the issue: https://github.com/apache/spark/pull/19746 jenkins retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #19588: [SPARK-12375][ML] VectorIndexerModel support hand...

2017-11-21 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/19588#discussion_r152443133 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala --- @@ -311,22 +342,39 @@ class VectorIndexerModel private[ml

[GitHub] spark pull request #19588: [SPARK-12375][ML] VectorIndexerModel support hand...

2017-11-21 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/19588#discussion_r152442655 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala --- @@ -311,22 +342,39 @@ class VectorIndexerModel private[ml

[GitHub] spark issue #19746: [SPARK-22346][ML] VectorSizeHint Transformer for using V...

2017-11-21 Thread MrBago
Github user MrBago commented on the issue: https://github.com/apache/spark/pull/19746 jenkins retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #19746: [SPARK-22346][ML] VectorSizeHint Transformer for ...

2017-11-20 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/19746#discussion_r152159939 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/VectorSizeHintSuite.scala --- @@ -0,0 +1,185 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19746: [SPARK-22346][ML] VectorSizeHint Transformer for ...

2017-11-20 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/19746#discussion_r152111218 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/VectorSizeHintSuite.scala --- @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19746: [SPARK-22346][ML] VectorSizeHint Transformer for ...

2017-11-20 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/19746#discussion_r152111084 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/VectorSizeHintSuite.scala --- @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #19746: [SPARK-22346][ML] VectorSizeHint Transformer for using V...

2017-11-14 Thread MrBago
Github user MrBago commented on the issue: https://github.com/apache/spark/pull/19746 Jenkins retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #19746: [SPARK-22346][ML] VectorSizeHint Transformer for using V...

2017-11-14 Thread MrBago
Github user MrBago commented on the issue: https://github.com/apache/spark/pull/19746 jenkins test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #19746: [SPARK-22346][ML]

2017-11-14 Thread MrBago
GitHub user MrBago opened a pull request: https://github.com/apache/spark/pull/19746 [SPARK-22346][ML] ## What changes were proposed in this pull request? A new VectorSizeHint transformer was added. This transformer is meant to be used as a pipeline stage ahead of

[GitHub] spark pull request #19527: [SPARK-13030][ML] Create OneHotEncoderEstimator f...

2017-10-31 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/19527#discussion_r148190855 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoderEstimator.scala --- @@ -0,0 +1,456 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19527: [SPARK-13030][ML] Create OneHotEncoderEstimator f...

2017-10-31 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/19527#discussion_r148056744 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoderEstimator.scala --- @@ -0,0 +1,456 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19439: [SPARK-21866][ML][PySpark] Adding spark image rea...

2017-10-31 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/19439#discussion_r148049352 --- Diff: python/pyspark/ml/image.py --- @@ -0,0 +1,139 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request #19439: [SPARK-21866][ML][PySpark] Adding spark image rea...

2017-10-26 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/19439#discussion_r147298609 --- Diff: python/pyspark/ml/image.py --- @@ -0,0 +1,122 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request #19439: [SPARK-21866][ML][PySpark] Adding spark image rea...

2017-10-26 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/19439#discussion_r147298335 --- Diff: python/pyspark/ml/image.py --- @@ -0,0 +1,133 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request #19527: [SPARK-13030][ML] Create OneHotEncoderEstimator f...

2017-10-23 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/19527#discussion_r146402800 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoderEstimator.scala --- @@ -0,0 +1,464 @@ +/* + * Licensed to the Apache

[GitHub] spark issue #19439: [SPARK-21866][ML][PySpark] Adding spark image reader

2017-10-19 Thread MrBago
Github user MrBago commented on the issue: https://github.com/apache/spark/pull/19439 @imatiach-msft just a few more comments. When I was looking over this I realized that the python and Scala name spaces are going to be a little different, eg `pyspark.ml.image.readImages` vs

[GitHub] spark pull request #19439: [SPARK-21866][ML][PySpark] Adding spark image rea...

2017-10-19 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/19439#discussion_r145843289 --- Diff: python/pyspark/ml/image.py --- @@ -0,0 +1,122 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request #19439: [SPARK-21866][ML][PySpark] Adding spark image rea...

2017-10-19 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/19439#discussion_r145842379 --- Diff: python/pyspark/ml/image.py --- @@ -0,0 +1,122 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

  1   2   >