Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/17130
The updated transform looks good; thanks for pinging!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/17233
I'll take a look
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wish
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/17233#discussion_r105719645
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/StringIndexerSuite.scala ---
@@ -122,6 +122,86 @@ class StringIndexerSuite
assert
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/17233#discussion_r105721463
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala ---
@@ -188,35 +189,45 @@ class StringIndexerModel
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/17233#discussion_r10571
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/StringIndexerSuite.scala ---
@@ -122,6 +122,86 @@ class StringIndexerSuite
assert
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/17233#discussion_r105706598
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala ---
@@ -39,20 +39,21 @@ import
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/17233#discussion_r105719023
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/StringIndexerSuite.scala ---
@@ -122,6 +122,86 @@ class StringIndexerSuite
assert
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/17218
True, if minSupport can be shared, then that's OK. confidence won't be
shared though.
---
If your project is set up for it, you can reply to this email and have your
reply appear on
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/17215
Thanks! Merging with master
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/17110
Ping @imatiach-msft any more comments after the update?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/17283
Thanks for fixing this issue! LGTM
Merging with master
Stating the JIRA number for a bug fix is reasonable, though it's most
useful if the bug appears in an actual release.
-
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/17233
LGTM
Merging with master
Thanks for the improvement!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/17237
Thanks for the PR! I just merged the fix for
https://issues.apache.org/jira/browse/SPARK-11569 which will affect this PR.
Would you mind updating this PR to include SPARK-11569's handli
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/15779
Thanks! I'll merge this with master and branch-2.1
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/15779
Linking commit for branch-2.1 since it hasn't linked automatically:
[https://github.com/apache/spark/commit/21bbf94b41fbd193e370a3820131e449aaf0e3db]
---
If your project is set up for it
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/15800
I disagree about this: you can do multiple probes with MinHash. For each
hash function, the probability of a collision is equal to the Jaccard
similarity coefficient: [https://en.wikipedia.org
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/15148
@sethah
> What is the intended use of the output column generated by transform? As
an alternative set of features with decreased dimensionality?
I agree it's ma
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/15800#discussion_r87245481
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHash.scala ---
@@ -32,13 +32,7 @@ import org.apache.spark.sql.types.StructType
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/15148
@MLnick I agree with most of your comments. A few responses:
> In terms of transform - I disagree somewhat that the main use case is
"dimensionality reduction". Perhaps the
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/15748
I do want to add this at some point, though we need to focus on QA for now
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/15148
Good points: Array of Vectors sounds good to me.
There has been a lot of discussion. I'm going to try to summarize things
in a follow-up JIRA, which I'll link here shortly.
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/15800
I'm not convinced that it is useful to think about AND and OR amplification
for MinHash for approxNearestNeighbors. Do you have a reference describing it?
I just can't think of a bet
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/15148
Phew, done! https://issues.apache.org/jira/browse/SPARK-18392
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/15800
Limiting this discussion to MinHash:
Say we construct a hash table of L x K functions (for doing both OR and AND
amplification). For approxNearestNeighbors, we need to ask, "What i
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/15795
Linking [https://issues.apache.org/jira/browse/SPARK-18392] since it will
alter the public API for LSH
---
If your project is set up for it, you can reply to this email and have your
reply
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/15787
Linking [https://issues.apache.org/jira/browse/SPARK-18392] since it will
alter the public API for LSH
---
If your project is set up for it, you can reply to this email and have your
reply
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/15800
I agree @sethah and I are on the same page. Two clarifications about
@Yunni 's post:
* I'm not sure what you mean by "your method will only have 1 indicator for
each row.&quo
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/15800
> Yes, I mean we average K 0/1 indicators for each Vector, and get L
averaged indicators. Do you agree with this part?
I'm afraid I still disagree. There's a fundament
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/15800
@sethah I misread what you wrote earlier: I still want to compute a single
estimate, rather than L separate ones.
---
If your project is set up for it, you can reply to this email and have your
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/15800
You're talking about enlarging search ranges, or iterations, a few times,
but I really do not think it makes sense to have multiple iterations in Spark.
Iterations are useful for avoidi
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/15843
Thanks a lot for finding & reporting this! The fix should probably go in
JavaWrapper, not JavaModel, right?
I tested this manually (in JavaWrapper), and it seems to fix the
problem
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/15843
You're right! It's another bug: copy should be implemented in JavaParams,
not JavaModel. I'm sending this PR to fix that:
https://github.com/techaddict/spark/pull/1
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/15843
Good point @holdenk --- @techaddict could you also please update the PR
title to say "JavaWrapper" instead of "StringIndexer"?
---
If your project is set up for it, you can
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/15843
I'm now wondering if ```__del___``` should be in JavaParams instead of
JavaWrapper since JavaWrapper does not override ```copy```. Do yall agree it
will be safer if it's moved there
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/15800
@Yunni I guess we should remove it from the public API. I'm OK with
leaving the code there and making it private for now.
*One response:*
> Enlarging search ranges
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/15843
I don't see a need to do deep copies of model summaries, but I agree I
don't like how JavaWrapper is ambiguous about whether it does shallow or deep
copies.
I'd say the
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/15817
Can you please add "[ML]" to the PR description? Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your projec
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/15874
Can you please add "[ML]" to the PR title?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/15279
Can you please change the title to have: "SPARK-12347" -> "[SPARK-12347]"?
---
If your project is set up for it, you can reply to this email and have your
reply appear on
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/15279
ok to test
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/15817
Can you please implement the Param directly in Bucketizer and
QuantileDiscretizer? Just like in Scala, HasHandleInvalid has built-in Param
doc which applies to existing use cases but not
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/15874
I'll take a look
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wish
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/15874#discussion_r88521575
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/MinHashLSHSuite.scala ---
@@ -86,9 +94,24 @@ class MinHashSuite extends SparkFunSuite with
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/15874#discussion_r88359067
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/BucketedRandomProjectionLSHSuite.scala
---
@@ -43,70 +43,72 @@ class RandomProjectionSuite
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/15874#discussion_r88351984
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHashLSH.scala
---
@@ -31,36 +31,34 @@ import org.apache.spark.sql.types.StructType
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/15874#discussion_r88353870
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHashLSH.scala
---
@@ -92,18 +93,17 @@ class MinHashModel private[ml] (
* LSH class for
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/15874#discussion_r88359545
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/BucketedRandomProjectionLSHSuite.scala
---
@@ -115,64 +117,83 @@ class RandomProjectionSuite
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/15874#discussion_r88349223
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/BucketedRandomProjectionLSH.scala
---
@@ -147,15 +151,17 @@ class RandomProjection(override
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/15874#discussion_r88351861
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHashLSH.scala
---
@@ -31,36 +31,34 @@ import org.apache.spark.sql.types.StructType
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/15874
Other comments:
**MinHash**
Looking yet again at this, I think it's using a technically incorrect hash
function. It is *not* a perfect hash function. It can hash 2
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/15874
@Yunni Thanks for the updates! I don't think we should include
AND-amplification for 2.1 since we're already in QA. But it'd be nice to get
it in 2.2. Also, 2.2 will give us pl
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/15788
@felixcheung @yanboliang + @shivaram @mengxr Just saw this change. IIRC
we intentionally made spark.glm match R's glm behavior for the binomial family,
though I see it was a little diff
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/15777
@yanboliang I noticed the JIRA is still pending. Are there follow-up tasks?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/15913#discussion_r89537785
--- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala ---
@@ -443,12 +429,11 @@ private[ml] trait GBTParams extends
TreeEnsembleParams
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/15913#discussion_r89537864
--- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala ---
@@ -317,8 +317,32 @@ private[ml] trait TreeEnsembleParams extends
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/15913#discussion_r89537771
--- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala ---
@@ -443,12 +425,11 @@ private[ml] trait GBTParams extends
TreeEnsembleParams
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/15913#discussion_r89537340
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
@@ -547,21 +547,6 @@ trait Params extends Identifiable with Serializable
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/15913
One more thing: This breaks several public APIs, although in mostly minor
ways and with good reasons. Could you please update the JIRA to list the
public API changes?
---
If your project is
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/16017#discussion_r89694871
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/DecisionTreeClassifier.scala
---
@@ -52,33 +52,49 @@ class DecisionTreeClassifier
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/16017#discussion_r89806761
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/classification/DecisionTreeClassifier.scala
---
@@ -52,33 +52,49 @@ class DecisionTreeClassifier
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/16017#discussion_r89819028
--- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala ---
@@ -134,27 +150,31 @@ private[ml] trait DecisionTreeParams extends
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/16017
LGTM pending a check on whether the deprecations affect Estimators in
addition to Models
Thanks for the follow-up!
---
If your project is set up for it, you can reply to this email and have
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/16017#discussion_r89824958
--- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala ---
@@ -107,25 +107,41 @@ private[ml] trait DecisionTreeParams extends
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/16011
I agree with @MLnick that we should not make this change. We need users to
be able to set all Params in Estimators, including the Params of the Models
they produce. If this is confusing for
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/15874
LGTM
Thanks everyone!
Merging with master and branch-2.1
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/15874
Well, I'm having trouble merging b/c of bad wifi during travel. Ping
@yanboliang @MLnick @mengxr would one of you mind merging this with master and
branch-2.1? @sethah and I having both
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/15788#discussion_r89840613
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/r/GeneralizedLinearRegressionWrapper.scala
---
@@ -52,7 +59,15 @@ private[r] class
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/15788
I agree that R makes it much easier to convert back to the string label.
My main worries are:
* changing the API, which may break some user code
* hiding probabilities from R users
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/15843
Sorry for the slow response on this. Given the time pressure for 2.1,
let's go with option 1 for now with a follow-up task to implement option 2. It
would be great to include this fix i
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/15972
test this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/15972
Quick note: These items will need to be updated in pyspark too.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/15972
Has anyone requested that LDAModel be unsealed? I'm trying to recall.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/15972
This looks good, except for the pyspark updates. Thank you @hhbyyh !
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/15768
This can now proceed since [https://github.com/apache/spark/pull/15874] has
been merged. Sorry for the delay! This will need to slip to 2.2
---
If your project is set up for it, you can reply
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/16023
Per the comments on the JIRA, could you please close this issue pending
further discussion? Thank you!
---
If your project is set up for it, you can reply to this email and have your
reply
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/16017#discussion_r90072953
--- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala ---
@@ -107,25 +107,41 @@ private[ml] trait DecisionTreeParams extends
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/16017
LGTM
Merging with master and branch-2.1
Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/16009#discussion_r90088312
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala ---
@@ -49,15 +49,13 @@ private[feature] trait ChiSqSelectorParams extends
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/15972
LGTM
merging with master and branch-2.1
Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/15571
@JoshRosen I could imagine us introducing this same issue in the future
with more Params in sharedParams.scala (like ```HasAggregationDepth```). As
long as it's fine for us add MimaExcludes
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/15571
They currently aren't, but a lot of users want them to be. To make them
available outside of Spark, we could create corresponding Java interfaces for
them and expose those interfaces inste
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/15428#discussion_r84561834
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala
---
@@ -73,15 +74,51 @@ final class Bucketizer @Since("1.4.0") (@Si
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/15428#discussion_r84562692
--- Diff: docs/ml-features.md ---
@@ -1104,9 +1104,11 @@ for more details on the API.
`QuantileDiscretizer` takes a column with continuous features
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/15428#discussion_r84563374
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/QuantileDiscretizer.scala ---
@@ -100,6 +102,24 @@ final class QuantileDiscretizer @Since
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/15428#discussion_r84563479
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala
---
@@ -73,15 +74,51 @@ final class Bucketizer @Since("1.4.0") (@Si
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/15428#discussion_r84563480
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala
---
@@ -73,15 +74,51 @@ final class Bucketizer @Since("1.4.0") (@Si
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/15428#discussion_r84563237
--- Diff: python/pyspark/ml/feature.py ---
@@ -1157,9 +1157,11 @@ class QuantileDiscretizer(JavaEstimator,
HasInputCol, HasOutputCol, JavaMLReadab
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/15428#discussion_r84563388
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/QuantileDiscretizer.scala ---
@@ -100,6 +102,24 @@ final class QuantileDiscretizer @Since
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/15428#discussion_r84563124
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/QuantileDiscretizerSuite.scala
---
@@ -82,14 +82,23 @@ class QuantileDiscretizerSuite
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r84566581
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
@@ -0,0 +1,343 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r84566224
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/BitSamplingSuite.scala ---
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r84566504
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
@@ -0,0 +1,343 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r84566599
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
@@ -245,50 +246,50 @@ private[ml] abstract class LSHModel[T <: LSHMode
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r84565977
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/BitSampling.scala ---
@@ -0,0 +1,163 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r84566821
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
@@ -0,0 +1,343 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r84587191
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ---
@@ -0,0 +1,340 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r84587195
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/MinHashSuite.scala ---
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r84587197
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/MinHashSuite.scala ---
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r84588545
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/MinHashSuite.scala ---
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/15148#discussion_r84589114
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/MinHashSuite.scala ---
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software
901 - 1000 of 7760 matches
Mail list logo