Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15874#discussion_r89013284
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
@@ -155,8 +148,30 @@ private[ml] abstract class LSHModel[T <: LSHMode
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15777#discussion_r88833295
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala ---
@@ -95,8 +95,7 @@ class BisectingKMeansModel private[ml
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/15874
@jkbradley Thanks for checking that, that is the conclusion I drew as well.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15874#discussion_r88753014
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHashLSH.scala
---
@@ -31,36 +31,40 @@ import org.apache.spark.sql.types.StructType
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/15874
I will take a look.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/9
@yinxusen I took a look at the updates. Will you be able to create the
design doc that Joseph mentioned?
---
If your project is set up for it, you can reply to this email and have your
reply
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/9#discussion_r88725427
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala
---
@@ -35,7 +38,25 @@ import org.apache.spark.sql.functions.{col, udf
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/9#discussion_r88724978
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala
---
@@ -35,7 +38,25 @@ import org.apache.spark.sql.functions.{col, udf
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/9#discussion_r88714626
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala
---
@@ -124,7 +147,8 @@ class KMeansModel private[ml] (
@Since("
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/9#discussion_r88713108
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala ---
@@ -414,6 +414,8 @@ object KMeans {
val RANDOM = "r
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/9#discussion_r88722547
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/clustering/KMeansSuite.scala ---
@@ -145,18 +150,67 @@ class KMeansSuite extends SparkFunSuite with
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/9#discussion_r88713359
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala
---
@@ -284,11 +309,26 @@ class KMeans @Since("1.5.0") (
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/9#discussion_r88715322
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala
---
@@ -306,6 +346,25 @@ class KMeans @Since("1.5.0") (
@Si
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/9#discussion_r88725396
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala
---
@@ -35,7 +38,25 @@ import org.apache.spark.sql.functions.{col, udf
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/9#discussion_r88713635
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala
---
@@ -284,11 +309,26 @@ class KMeans @Since("1.5.0") (
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15874#discussion_r88536087
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
@@ -179,16 +211,13 @@ private[ml] abstract class LSHModel[T <: LSHMode
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15831#discussion_r88530411
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala ---
@@ -243,6 +244,42 @@ final class ChiSqSelectorModel private[ml
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/15831
I see this patch was created as a result of the PR that separated the
ml/mllib linalg packages, to avoid some inefficiencies in conversion. However,
it also is a partial step toward feature parity
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15777#discussion_r88495818
--- Diff: python/pyspark/ml/tests.py ---
@@ -1097,6 +1097,44 @@ def test_logistic_regression_summary(self):
sameSummary = model.evaluate(df
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15777#discussion_r88483092
--- Diff: python/pyspark/ml/tests.py ---
@@ -1097,6 +1097,44 @@ def test_logistic_regression_summary(self):
sameSummary = model.evaluate(df
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15777#discussion_r88269525
--- Diff: python/pyspark/ml/clustering.py ---
@@ -346,6 +453,27 @@ def computeCost(self, dataset):
"""
return
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15874#discussion_r88142430
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
@@ -179,16 +211,13 @@ private[ml] abstract class LSHModel[T <: LSHMode
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/15893
cc @MLnick @dbtsai
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
GitHub user sethah opened a pull request:
https://github.com/apache/spark/pull/15893
[SPARK-18456][ML][FOLLOWUP] Use matrix abstraction for coefficients in
LogisticRegression training
## What changes were proposed in this pull request?
This is a follow up to some of the
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15874#discussion_r87906133
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHashLSH.scala
---
@@ -31,13 +31,9 @@ import org.apache.spark.sql.types.StructType
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15874#discussion_r87906309
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHashLSH.scala
---
@@ -46,21 +42,23 @@ import org.apache.spark.sql.types.StructType
@Since
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15874#discussion_r87844941
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
@@ -106,22 +123,24 @@ private[ml] abstract class LSHModel[T <: LSHMode
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15874#discussion_r87906709
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHashLSH.scala
---
@@ -46,21 +42,23 @@ import org.apache.spark.sql.types.StructType
@Since
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15874#discussion_r87878252
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHashLSH.scala
---
@@ -102,8 +103,7 @@ class MinHashModel private[ml
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15874#discussion_r87908012
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHashLSH.scala
---
@@ -125,11 +125,11 @@ class MinHash(override val uid: String) extends
LSH
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15874#discussion_r87922281
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
@@ -179,16 +211,13 @@ private[ml] abstract class LSHModel[T <: LSHMode
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15874#discussion_r87874869
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
@@ -66,10 +66,10 @@ private[ml] abstract class LSHModel[T <: LSHModel[T]]
s
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15874#discussion_r87904353
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/MinHashLSHSuite.scala ---
@@ -24,7 +24,7 @@ import org.apache.spark.ml.util.DefaultReadWriteTest
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15874#discussion_r87875688
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHashLSH.scala
---
@@ -46,21 +42,23 @@ import org.apache.spark.sql.types.StructType
@Since
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15874#discussion_r87928721
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/BucketedRandomProjectionLSH.scala
---
@@ -89,23 +90,25 @@ class RandomProjectionModel private[ml
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15874#discussion_r87876322
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHashLSH.scala
---
@@ -102,8 +103,7 @@ class MinHashModel private[ml
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15874#discussion_r87871105
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
@@ -106,22 +106,24 @@ private[ml] abstract class LSHModel[T <: LSHMode
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15874#discussion_r87874663
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
@@ -35,26 +35,26 @@ private[ml] trait LSHParams extends HasInputCol with
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15874#discussion_r87910679
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHashLSH.scala
---
@@ -74,9 +72,12 @@ class MinHashModel private[ml
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15874#discussion_r87875995
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHashLSH.scala
---
@@ -74,9 +72,12 @@ class MinHashModel private[ml
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15874#discussion_r87844308
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHashLSH.scala
---
@@ -144,12 +152,12 @@ class MinHash(override val uid: String) extends
LSH
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15881#discussion_r87887739
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -171,7 +171,10 @@ class LinearRegression @Since("1.3.0"
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15777#discussion_r87841411
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala ---
@@ -132,7 +132,7 @@ class BisectingKMeansModel private[ml
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/15874
Thanks @yunni, I can take a look at this today. I would prefer to separate
the addition of "AND-amplification" into another PR since the other changes I
believe we'd like to get in
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/15777
ping @yanboliang
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15881#discussion_r87826206
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -171,7 +171,10 @@ class LinearRegression @Since("1.3.0"
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/15593
Thanks @dbtsai!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/15593
Thanks for the detailed explanation @dbtsai. +1 for doing this in a
separate PR, since I'd imagine we want to run all the performance tests again
as well.
---
If your project is set up for it
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15683#discussion_r87639131
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala
---
@@ -88,6 +89,12 @@ class
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15817#discussion_r87617539
--- Diff: python/pyspark/ml/feature.py ---
@@ -158,21 +158,28 @@ class Bucketizer(JavaTransformer, HasInputCol,
HasOutputCol, JavaMLReadable, Jav
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15817#discussion_r87617849
--- Diff: python/pyspark/ml/feature.py ---
@@ -1163,9 +1184,11 @@ class QuantileDiscretizer(JavaEstimator,
HasInputCol, HasOutputCol, JavaMLReadab
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/15800
@jkbradley Thanks for clarifying, I see your argument now. I agree that it
makes sense from a statistical perspective. Still, I have not seen a single
paper that describes anything quite exactly
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/15683
@actuaryzhang Thanks a lot for correcting this! I just had a small comment
to make the additional test shorter.
---
If your project is set up for it, you can reply to this email and have your
reply
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15683#discussion_r87609107
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala
---
@@ -453,6 +464,56 @@ class
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15593#discussion_r87504228
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
---
@@ -489,13 +485,14 @@ class LogisticRegression @Since("
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15683#discussion_r87487494
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala
---
@@ -453,6 +454,8 @@ class
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15683#discussion_r87487460
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala
---
@@ -83,10 +83,11 @@ class
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/15800
I think that we would have the following hash distance signature:
scala
def hashDistance(x: Vector, y: Vector): Double
Then in `approxNearestNeighbors` we would
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/15800
I agree with @jkbradley's suggested approach. One key point here (for
MinHash):
If a query point vector q hashes to some MinHash Vector [5.0, 22.0, 13.0]
the best candidates will be
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15800#discussion_r87429950
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHash.scala ---
@@ -76,7 +72,19 @@ class MinHashModel private[ml] (
@Since("
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15593#discussion_r87275543
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
---
@@ -489,13 +485,14 @@ class LogisticRegression @Since("
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/15148
If we were to use a matrix for the output, then when we do
`approxSimilarityJoin` we would want to explode the output column by matrix
rows, assuming the matrix structure was
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/15074
Something like `foreachActive` for matrices would enable a better solution,
but if we don't go that route then I agree with @thunterdb about comparing
sparse matrices with the same tran
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/15800
@jkbradley Your updated summary above is in line with my view as well -
that "multi-probing" as described in the paper doesn't translate exactly to
MinHash, but that it does ma
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/15779
LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/15800
Good point. Maybe we can log a warning when multi-probing is called with
MinHash - to say that it will result in running brute force knn when there
aren't enough candidates.
---
If your proje
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/15800
Using this as hashing distance for near-neighbor search doesn't make sense
to me. If there aren't enough candidates where the distance is zero, we'll
select some candidates who ha
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/9
This is probably going to miss 2.1 since we are officially in QA now, just
as an fyi.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15779#discussion_r87014522
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -404,6 +406,13 @@ object LinearRegression extends
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/15593
@MLnick I updated it with your suggested wording for the comments.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/15148
I was using L to refer to the number of compound hash functions, but you're
right that in my explanation L was the "OR" parameter and d was the "AND"
parameter.
Thi
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/15768
I began to review this, but got sidetracked with a lot of the details we
are currently discussing on the [original LSH
PR](https://github.com/apache/spark/pull/15148).
---
If your project is set
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/15148
So I'll try to summarize the AND/OR amplification and how I think it fits
into the current API right now. LSH relies on a single hashing function `h(x)`
which is (R, cR, p1, p2)-sensitive which
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/15148
Ok, I'm looking more closely at this algorithm versus the literature. I
agree that there is a lot of inconsistent terminology which is probably leading
to some of the confusion here.
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r86719955
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHash.scala ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/15148
@karlhigley Thanks for your detailed response. From the amplification
section on
[Wikipedia](https://en.wikipedia.org/wiki/Locality-sensitive_hashing#Amplification),
it is pretty clear to me that
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/15148
I apologize for coming late to this, but I am taking a look at some of the
documentation now. For `RandomProjection` class there are two links: one to
wikipedia entry on stable distributions and one
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15779#discussion_r86670363
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -166,6 +166,9 @@ class LinearRegression @Since("1.3.0"
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15779#discussion_r86670216
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/optim/NormalEquationSolver.scala ---
@@ -156,7 +157,7 @@ private[ml] class QuasiNewtonSolver
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/15779
+1 on removing the use of exceptions. I thought it was a bit of an awkward
solution to begin with. Thanks a lot for this pr, I will take a look soon.
---
If your project is set up for it, you can
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/13557
I created [SPARK-18282](https://issues.apache.org/jira/browse/SPARK-18282)
and the PR: https://github.com/apache/spark/pull/15777 to implement this
interface for GMM and BisectingKMeans. These two
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15777#discussion_r86653312
--- Diff: python/pyspark/ml/classification.py ---
@@ -309,13 +309,16 @@ def interceptVector(self):
@since("2.0.0")
def su
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15777#discussion_r86653456
--- Diff: python/pyspark/ml/tests.py ---
@@ -1097,6 +1097,42 @@ def test_logistic_regression_summary(self):
sameSummary = model.evaluate(df
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/13557#discussion_r86653603
--- Diff: python/pyspark/ml/clustering.py ---
@@ -201,7 +202,74 @@ def computeCost(self, dataset):
"""
return
GitHub user sethah opened a pull request:
https://github.com/apache/spark/pull/15777
[SPARK-18282][ML][PYSPARK] Add python clustering summaries for GMM and BKM
## What changes were proposed in this pull request?
Add model summary APIs for `GaussianMixtureModel` and
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/15773
@yanboliang mind having a look?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
GitHub user sethah opened a pull request:
https://github.com/apache/spark/pull/15773
[SPARK-18276][ML] ML models should copy the training summary and set parent
## What changes were proposed in this pull request?
Only some of the models which contain a training summary
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15314#discussion_r86569986
--- Diff: mllib/src/main/scala/org/apache/spark/ml/Predictor.scala ---
@@ -70,8 +68,8 @@ private[ml] trait PredictorParams extends Params
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/15314
LGTM after typo is fixed. ping @jkbradley @srowen
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/15762
Looks like a duplicate of https://github.com/apache/spark/pull/12574 ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15314#discussion_r86461964
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/IsotonicRegression.scala ---
@@ -86,7 +86,7 @@ private[regression] trait IsotonicRegressionBase
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15314#discussion_r86457706
--- Diff: mllib/src/main/scala/org/apache/spark/ml/Predictor.scala ---
@@ -51,6 +51,16 @@ private[ml] trait PredictorParams extends Params
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15314#discussion_r86463845
--- Diff: mllib/src/main/scala/org/apache/spark/ml/Predictor.scala ---
@@ -59,10 +69,12 @@ private[ml] trait PredictorParams extends Params
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15314#discussion_r86460724
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/util/MLTestingUtils.scala ---
@@ -47,18 +48,49 @@ object MLTestingUtils extends SparkFunSuite
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15314#discussion_r86459287
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/util/MLTestingUtils.scala ---
@@ -137,10 +172,11 @@ object MLTestingUtils extends SparkFunSuite
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15314#discussion_r86464072
--- Diff: mllib/src/main/scala/org/apache/spark/ml/Predictor.scala ---
@@ -91,7 +103,20 @@ abstract class Predictor[
// Cast LabelCol to
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15314#discussion_r86463958
--- Diff: mllib/src/main/scala/org/apache/spark/ml/Predictor.scala ---
@@ -91,7 +103,20 @@ abstract class Predictor[
// Cast LabelCol to
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15593#discussion_r86434451
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
---
@@ -489,13 +485,14 @@ class LogisticRegression @Since("
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15593#discussion_r86433115
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
---
@@ -1486,57 +1489,75 @@ private class LogisticAggregator
Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/15593#discussion_r86436188
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
---
@@ -1486,57 +1489,75 @@ private class LogisticAggregator
701 - 800 of 1857 matches
Mail list logo