GitHub user MrBago opened a pull request:
https://github.com/apache/spark/pull/22139
[SPARK-25149][GraphX] Update Parallel Personalized Page Rank to test with
large vertexIds
## What changes were proposed in this pull request?
runParallelPersonalizedPageRank in graphx
GitHub user MrBago opened a pull request:
https://github.com/apache/spark/pull/21799
[SPARK-24747][ML] Update spark.ml to use Instrumentation.instrumented.
## What changes were proposed in this pull request?
Update spark.ml training code to fully wrap instrumented methods
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/21719#discussion_r203126526
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/util/Instrumentation.scala ---
@@ -19,45 +19,60 @@ package org.apache.spark.ml.util
import
GitHub user MrBago opened a pull request:
https://github.com/apache/spark/pull/21719
[SPARK-24747] Make Instrumentation class more flexible
## What changes were proposed in this pull request?
This PR updates the Instrumentation class to make it more flexible and a
little
Github user MrBago commented on the issue:
https://github.com/apache/spark/pull/21344
@ludatabricks thanks for taking a loo, I added `logParams`.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
GitHub user MrBago opened a pull request:
https://github.com/apache/spark/pull/21344
[SPARK-24114] Add instrumentation to FPGrowth.
## What changes were proposed in this pull request?
Have FPGrowth keep track of model training using the Instrumentation class
GitHub user MrBago opened a pull request:
https://github.com/apache/spark/pull/21340
[SPARK-24115] Have logging pass through instrumentation class.
## What changes were proposed in this pull request?
Fixes to tuning instrumentation.
## How was this patch tested
Github user MrBago commented on the issue:
https://github.com/apache/spark/pull/21195
Thanks Lu!
I had a pass over this PR and it looks pretty straightforward. One thing I
noticed is that there are two patterns that we keep repeating. I think we
should add private APIs for
Github user MrBago commented on the issue:
https://github.com/apache/spark/pull/21195
Looking now.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/20695#discussion_r175971741
--- Diff: python/pyspark/ml/stat.py ---
@@ -132,6 +134,172 @@ def corr(dataset, column, method="pearson"):
return _java2py(sc, javaCo
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/20837#discussion_r175564965
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
---
@@ -517,6 +517,9 @@ class LogisticRegression @Since("
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/20837#discussion_r175563532
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
---
@@ -517,6 +517,9 @@ class LogisticRegression @Since("
GitHub user MrBago opened a pull request:
https://github.com/apache/spark/pull/20837
[SPARK-23686][ML][WIP] Better instrumentation
## What changes were proposed in this pull request?
This PR is meant to show how we could better utilize the Instrumentation
class in spark.ml
Github user MrBago commented on the issue:
https://github.com/apache/spark/pull/19108
Thanks for the changes Weichen, this lgtm.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/19108#discussion_r171653583
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/stat/KolmogorovSmirnovTest.scala ---
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/19108#discussion_r171653534
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/stat/KolmogorovSmirnovTest.scala ---
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/19108#discussion_r171438289
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/stat/KolmogorovSmirnovTest.scala ---
@@ -0,0 +1,104 @@
+/*
+ * Licensed to the Apache Software
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/19108#discussion_r171438156
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/stat/KolmogorovSmirnovTest.scala ---
@@ -0,0 +1,104 @@
+/*
+ * Licensed to the Apache Software
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/19108#discussion_r171433430
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/stat/KolmogorovSmirnovTest.scala ---
@@ -0,0 +1,104 @@
+/*
+ * Licensed to the Apache Software
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/19108#discussion_r171423827
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/stat/KolmogorovSmirnovTest.scala ---
@@ -0,0 +1,104 @@
+/*
+ * Licensed to the Apache Software
Github user MrBago commented on the issue:
https://github.com/apache/spark/pull/20566
I believe this will break persistence for LogisticRegression. I believe the
issue is that the `threshold` param on LogisticRegressionModel doesn't get a
default directly, but only gets it durin
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/20285#discussion_r163390328
--- Diff: docs/ml-features.md ---
@@ -1283,6 +1283,56 @@ for more details on the API.
+## VectorSizeHint
+
+It can sometimes be
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/20285#discussion_r163389908
--- Diff: docs/ml-features.md ---
@@ -1283,6 +1283,56 @@ for more details on the API.
+## VectorSizeHint
+
+It can sometimes be
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/20285#discussion_r163341372
--- Diff:
examples/src/main/java/org/apache/spark/examples/ml/JavaVectorSizeHintExample.java
---
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache
Github user MrBago commented on the issue:
https://github.com/apache/spark/pull/20285
I'd like to try and get this patched into 2.3 to make sure our
documentation is complete for the 2.3 release. @viirya and @WeichenXu123 would
you mind having another look
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/20285#discussion_r162519014
--- Diff: docs/ml-features.md ---
@@ -1283,6 +1283,56 @@ for more details on the API.
+## VectorSizeHint
+
+It can sometimes be
Github user MrBago commented on the issue:
https://github.com/apache/spark/pull/20285
Thanks for the review @BryanCutler, I've added a java example & uploaded 2
screenshots.
---
-
To unsubscribe, e-mail
Github user MrBago commented on the issue:
https://github.com/apache/spark/pull/20285
https://user-images.githubusercontent.com/223219/35074422-9c192bf6-fba2-11e7-8be1-35279db1df49.png";>
---
-
To unsubs
Github user MrBago commented on the issue:
https://github.com/apache/spark/pull/20285
https://user-images.githubusercontent.com/223219/35074406-90b074ea-fba2-11e7-853b-45e3c447fe68.png";>
---
-
To unsubs
GitHub user MrBago opened a pull request:
https://github.com/apache/spark/pull/20285
[SPARK-22735][ML][DOC] Added VectorSizeHint docs and examples.
## What changes were proposed in this pull request?
Added documentation for new transformer.
You can merge this pull
Github user MrBago commented on the issue:
https://github.com/apache/spark/pull/20280
BTW the performance issue is orthogonal to the serialization issue raised
in this jira/PR. Maybe we should avoid scope creep in this thread
Github user MrBago commented on the issue:
https://github.com/apache/spark/pull/20280
I think we should raise an error if `__from_dict__` is set and the user
tries to index using a position or a slice.
Indexing by field name takes the same code path for Rows that are and are
Github user MrBago commented on the issue:
https://github.com/apache/spark/pull/20229
I've rebased on master, I think that should resolve the the issues @viirya
raised.
---
-
To unsubscribe, e-mail: re
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/20168#discussion_r161328577
--- Diff: python/pyspark/ml/tests.py ---
@@ -1843,6 +1844,27 @@ def tearDown(self):
class ImageReaderTest(SparkSessionTestCase
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/20229#discussion_r161153997
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala
---
@@ -230,16 +231,17 @@ class RFormula @Since("1.5.0") (@Si
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/20241#discussion_r161116909
--- Diff: python/pyspark/ml/feature.py ---
@@ -1577,6 +1577,8 @@ class OneHotEncoder(JavaTransformer, HasInputCol,
HasOutputCol, JavaMLReadable
GitHub user MrBago opened a pull request:
https://github.com/apache/spark/pull/20238
Have RFormula include VectorSizeHint in pipeline
## What changes were proposed in this pull request?
Including VectorSizeHint in RFormula piplelines will allow them to be
applied to
GitHub user MrBago opened a pull request:
https://github.com/apache/spark/pull/20229
Update RFormula to use VectorSizeHint & OneHotEncoderEstimator.
## What changes were proposed in this pull request?
RFormula should use VectorSizeHint & OneHotEncoderEstimato
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/20168#discussion_r160502167
--- Diff: python/pyspark/ml/image.py ---
@@ -71,9 +88,30 @@ def ocvTypes(self):
"""
if self._o
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/20168#discussion_r160293894
--- Diff: python/pyspark/ml/tests.py ---
@@ -1843,6 +1844,28 @@ def tearDown(self):
class ImageReaderTest(SparkSessionTestCase
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/20168#discussion_r160293554
--- Diff: python/pyspark/ml/tests.py ---
@@ -1843,6 +1844,28 @@ def tearDown(self):
class ImageReaderTest(SparkSessionTestCase
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/20168#discussion_r160300141
--- Diff: python/pyspark/ml/image.py ---
@@ -71,9 +88,30 @@ def ocvTypes(self):
"""
if self._o
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/20168#discussion_r160294482
--- Diff: python/pyspark/ml/image.py ---
@@ -71,9 +88,30 @@ def ocvTypes(self):
"""
if self._o
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/20168#discussion_r160264983
--- Diff: python/pyspark/ml/tests.py ---
@@ -1843,6 +1844,28 @@ def tearDown(self):
class ImageReaderTest(SparkSessionTestCase
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/20168#discussion_r160294154
--- Diff: python/pyspark/ml/image.py ---
@@ -150,29 +194,27 @@ def toImage(self, array, origin=""):
"array arg
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/20168#discussion_r160267784
--- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala
---
@@ -37,20 +37,51 @@ import org.apache.spark.sql.types._
@Since("
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/20168#discussion_r160267831
--- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala
---
@@ -37,20 +37,51 @@ import org.apache.spark.sql.types._
@Since("
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/20168#discussion_r160267553
--- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala
---
@@ -143,12 +174,12 @@ object ImageSchema {
val height
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/20168#discussion_r160291075
--- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala
---
@@ -37,20 +37,51 @@ import org.apache.spark.sql.types._
@Since("
GitHub user MrBago opened a pull request:
https://github.com/apache/spark/pull/20143
[SPARK-22949][ML] Apply CrossValidator approach to Driver/Distributed
memory tradeoff for TrainValidationSplit
## What changes were proposed in this pull request?
Avoid holding all models
Github user MrBago commented on the issue:
https://github.com/apache/spark/pull/20095
@jkbradley I pushed changes in response to your comments. I think we should
split the `TrainValidationSplit` memory split into another PR, I may have time
to work on it tomorrow
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/20058#discussion_r159105682
--- Diff: python/pyspark/ml/base.py ---
@@ -47,6 +86,28 @@ def _fit(self, dataset):
"""
raise NotI
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/20058#discussion_r159105728
--- Diff: python/pyspark/ml/base.py ---
@@ -18,13 +18,40 @@
from abc import ABCMeta, abstractmethod
import copy
+import threading
GitHub user MrBago opened a pull request:
https://github.com/apache/spark/pull/20112
[SPARK-22734][ML][PySpark] Added Python API for VectorSizeHint.
(Please fill in changes proposed in this fix)
Python API for VectorSizeHint Transformer.
(Please explain how this
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/20058#discussion_r159024163
--- Diff: python/pyspark/ml/base.py ---
@@ -47,6 +86,28 @@ def _fit(self, dataset):
"""
raise NotI
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/20058#discussion_r159023958
--- Diff: python/pyspark/ml/base.py ---
@@ -47,6 +86,28 @@ def _fit(self, dataset):
"""
raise NotI
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/20095#discussion_r159011656
--- Diff: mllib/src/main/scala/org/apache/spark/ml/Estimator.scala ---
@@ -79,7 +82,51 @@ abstract class Estimator[M <: Model[M]] extends
PipelineSt
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/20095#discussion_r159007817
--- Diff: mllib/src/main/scala/org/apache/spark/ml/Estimator.scala ---
@@ -79,7 +82,51 @@ abstract class Estimator[M <: Model[M]] extends
PipelineSt
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/20095#discussion_r159006471
--- Diff: mllib/src/main/scala/org/apache/spark/ml/Estimator.scala ---
@@ -79,7 +82,51 @@ abstract class Estimator[M <: Model[M]] extends
PipelineSt
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/20058#discussion_r159004397
--- Diff: python/pyspark/ml/base.py ---
@@ -47,6 +74,24 @@ def _fit(self, dataset):
"""
raise NotI
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/20058#discussion_r159004318
--- Diff: python/pyspark/ml/base.py ---
@@ -18,13 +18,40 @@
from abc import ABCMeta, abstractmethod
import copy
+import threading
Github user MrBago commented on the issue:
https://github.com/apache/spark/pull/19979
@WeichenXu123 it looks like `testTransformer` is a special case of
`testTransformerByGlobalCheckFunc`. I think it's cleaner to structure the tests
in this way instead of passing around
GitHub user MrBago opened a pull request:
https://github.com/apache/spark/pull/20095
[SPARK-22126][ML] Added fitMultiple method with default implementation
â¦mator.
Update TrainValidationSplit & CrossValidator to use fitMultiple method.
## What changes
GitHub user MrBago opened a pull request:
https://github.com/apache/spark/pull/20058
[SPARK-22126][ML][PySpark] Pyspark portion of the fit-multiple API
## What changes were proposed in this pull request?
Adding fitMultiple API to `Estimator` with default implementation
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/19997#discussion_r157325083
--- Diff: python/pyspark/ml/tests.py ---
@@ -44,6 +44,7 @@
import numpy as np
from numpy import abs, all, arange, array, array_equal, inf, ones
GitHub user MrBago opened a pull request:
https://github.com/apache/spark/pull/19997
[SPARK-22811][pyspark][ml] Fix pyspark.ml.tests failure when Hive is not
available.
## What changes were proposed in this pull request?
pyspark.ml.tests is missing a py4j import. I
Github user MrBago closed the pull request at:
https://github.com/apache/spark/pull/19986
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org
Github user MrBago closed the pull request at:
https://github.com/apache/spark/pull/19987
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org
GitHub user MrBago opened a pull request:
https://github.com/apache/spark/pull/19987
[Test][WIP] add passing test.
missing import in python/pyspark/ml/tests.py, verifying with CI.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com
GitHub user MrBago opened a pull request:
https://github.com/apache/spark/pull/19986
[Test][WIP] add failing test.
missing import in python/pyspark/ml/tests.py, verifying with CI.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com
Github user MrBago commented on the issue:
https://github.com/apache/spark/pull/19904
@BryanCutler Thanks for the response, I think I understand the situation a
little better here.
I think there is a fundamental tradeoff we cannot avoid. Specifically there
is a tradeoff
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/19904#discussion_r156751569
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala ---
@@ -146,25 +147,18 @@ class CrossValidator @Since("1.2.0") (@Si
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/19746#discussion_r156750447
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/VectorSizeHintSuite.scala ---
@@ -0,0 +1,173 @@
+/*
+ * Licensed to the Apache Software
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/19904#discussion_r156511049
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala ---
@@ -146,25 +147,18 @@ class CrossValidator @Since("1.2.0") (@Si
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/19746#discussion_r156216171
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala ---
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/19904#discussion_r155693144
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala ---
@@ -146,31 +146,34 @@ class CrossValidator @Since("1.2.0") (@Si
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/19746#discussion_r154815581
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/VectorSizeHintSuite.scala ---
@@ -0,0 +1,173 @@
+/*
+ * Licensed to the Apache Software
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/19843#discussion_r153964637
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala ---
@@ -133,6 +133,9 @@ trait StreamTest extends QueryTest with
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/19843#discussion_r153964786
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala
---
@@ -233,7 +232,8 @@ class LinearRegressionSuite
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/19843#discussion_r153964308
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala ---
@@ -573,8 +578,19 @@ trait StreamTest extends QueryTest with
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/19588#discussion_r152701758
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala ---
@@ -311,22 +342,39 @@ class VectorIndexerModel private[ml
Github user MrBago commented on the issue:
https://github.com/apache/spark/pull/19746
jenkins retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/19588#discussion_r152443133
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala ---
@@ -311,22 +342,39 @@ class VectorIndexerModel private[ml
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/19588#discussion_r152442655
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala ---
@@ -311,22 +342,39 @@ class VectorIndexerModel private[ml
Github user MrBago commented on the issue:
https://github.com/apache/spark/pull/19746
jenkins retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/19746#discussion_r152159939
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/VectorSizeHintSuite.scala ---
@@ -0,0 +1,185 @@
+/*
+ * Licensed to the Apache Software
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/19746#discussion_r152111218
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/VectorSizeHintSuite.scala ---
@@ -0,0 +1,135 @@
+/*
+ * Licensed to the Apache Software
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/19746#discussion_r152111084
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/VectorSizeHintSuite.scala ---
@@ -0,0 +1,135 @@
+/*
+ * Licensed to the Apache Software
Github user MrBago commented on the issue:
https://github.com/apache/spark/pull/19746
Jenkins retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user MrBago commented on the issue:
https://github.com/apache/spark/pull/19746
jenkins test
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
GitHub user MrBago opened a pull request:
https://github.com/apache/spark/pull/19746
[SPARK-22346][ML]
## What changes were proposed in this pull request?
A new VectorSizeHint transformer was added. This transformer is meant to be
used as a pipeline stage ahead of
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/19527#discussion_r148190855
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoderEstimator.scala
---
@@ -0,0 +1,456 @@
+/*
+ * Licensed to the Apache
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/19527#discussion_r148056744
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoderEstimator.scala
---
@@ -0,0 +1,456 @@
+/*
+ * Licensed to the Apache
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/19439#discussion_r148049352
--- Diff: python/pyspark/ml/image.py ---
@@ -0,0 +1,139 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/19439#discussion_r147298609
--- Diff: python/pyspark/ml/image.py ---
@@ -0,0 +1,122 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/19439#discussion_r147298335
--- Diff: python/pyspark/ml/image.py ---
@@ -0,0 +1,133 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/19527#discussion_r146402800
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoderEstimator.scala
---
@@ -0,0 +1,464 @@
+/*
+ * Licensed to the Apache
Github user MrBago commented on the issue:
https://github.com/apache/spark/pull/19439
@imatiach-msft just a few more comments. When I was looking over this I
realized that the python and Scala name spaces are going to be a little
different, eg `pyspark.ml.image.readImages` vs
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/19439#discussion_r145843289
--- Diff: python/pyspark/ml/image.py ---
@@ -0,0 +1,122 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
Github user MrBago commented on a diff in the pull request:
https://github.com/apache/spark/pull/19439#discussion_r145842379
--- Diff: python/pyspark/ml/image.py ---
@@ -0,0 +1,122 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
1 - 100 of 133 matches
Mail list logo