[GitHub] spark issue #17336: [SPARK-20003] [ML] FPGrowthModel setMinConfidence should...

2017-04-04 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17336 Thanks a lot for the second update! This LGTM Merging with master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request #17532: [SPARK-20214][ML] Make sure converted csc matrix ...

2017-04-05 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17532#discussion_r109975956 --- Diff: python/pyspark/ml/linalg/__init__.py --- @@ -72,7 +72,9 @@ def _convert_to_vector(l): return DenseVector(l) elif

[GitHub] spark issue #17532: [SPARK-20214][ML] Make sure converted csc matrix has sor...

2017-04-05 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17532 Are you able to write a unit test which passes data through _convert_to_vector and fails before this fix? --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #17532: [SPARK-20214][ML] Make sure converted csc matrix has sor...

2017-04-05 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17532 Btw, I'd really like to get this into 2.2, which will be cut soon. Let me know if you'd like me to take it over. Thanks! --- If your project is set up for it, you can reply to this

[GitHub] spark pull request #17532: [SPARK-20214][ML] Make sure converted csc matrix ...

2017-04-05 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17532#discussion_r110057924 --- Diff: python/pyspark/mllib/tests.py --- @@ -853,6 +853,17 @@ def serialize(l): self.assertEqual(sv, serialize(lil.tocsr

[GitHub] spark issue #17532: [SPARK-20214][ML] Make sure converted csc matrix has sor...

2017-04-05 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17532 LGTM Merging with master and branch-2.1, branch-2.0 Thanks a lot! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #17494: [SPARK-20076][ML][PySpark] Add Python interface for ml.s...

2017-04-06 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17494 LGTM if others are ok too --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #16782: [SPARK-19348][PYTHON][WIP] PySpark keyword_only decorato...

2017-02-27 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16782 I'm OK with the current solution, though if it's easy to check using ```inspection``` then that seems nice to do. If there are cases in which the wrapper is still not thread-

[GitHub] spark issue #17048: [SPARK-14772][PYTHON][ML] Fixed Params.copy method to ma...

2017-02-27 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17048 Can you please close this manually? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

2017-02-27 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15770 Yep, that's correct. Everyone, please let me know if you disagree. Also, if we do go with Option 2 above, then the input schema could be a few possible things: * list of (neighb

[GitHub] spark issue #16883: [SPARK-17498][ML] StringIndexer enhancement for handling...

2017-02-27 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16883 I'll take a look now, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature en

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-02-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16883#discussion_r103332885 --- Diff: docs/ml-features.md --- @@ -576,7 +578,22 @@ will be generated: 2 | c| 1.0 -Notice that the row containing &q

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-02-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16883#discussion_r103332764 --- Diff: docs/ml-features.md --- @@ -502,7 +502,7 @@ for more details on the API. ## StringIndexer `StringIndexer` encodes a string

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-02-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16883#discussion_r103331444 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -163,25 +190,28 @@ class StringIndexerModel

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-02-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16883#discussion_r103331212 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -163,25 +190,28 @@ class StringIndexerModel

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-02-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16883#discussion_r103330268 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -71,18 +90,22 @@ class StringIndexer @Since("1.4.0") (

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-02-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16883#discussion_r103330303 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -71,18 +90,22 @@ class StringIndexer @Since("1.4.0") (

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-02-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16883#discussion_r103330242 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -71,18 +90,22 @@ class StringIndexer @Since("1.4.0") (

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-02-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16883#discussion_r103330093 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -34,8 +36,25 @@ import org.apache.spark.util.collection.OpenHashMap

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-02-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16883#discussion_r103332623 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/StringIndexerSuite.scala --- @@ -75,22 +75,32 @@ class StringIndexerSuite intercept

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-02-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16883#discussion_r103332929 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -34,8 +36,25 @@ import org.apache.spark.util.collection.OpenHashMap

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-02-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16883#discussion_r103325211 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -17,14 +17,16 @@ package org.apache.spark.ml.feature

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-02-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16883#discussion_r103325403 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -34,8 +36,25 @@ import org.apache.spark.util.collection.OpenHashMap

[GitHub] spark pull request #16811: [SPARK-17629][ML] methods to return synonyms dire...

2017-02-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16811#discussion_r103338146 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/Word2VecSuite.scala --- @@ -144,6 +144,31 @@ class Word2VecSuite extends SparkFunSuite with

[GitHub] spark pull request #16811: [SPARK-17629][ML] methods to return synonyms dire...

2017-02-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16811#discussion_r103338261 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/Word2VecSuite.scala --- @@ -144,6 +144,31 @@ class Word2VecSuite extends SparkFunSuite with

[GitHub] spark issue #14273: [SPARK-9140] [ML] Replace TimeTracker by MultiStopwatch

2017-02-27 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/14273 Sorry about the delay here. Do you still have time to work on this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #16965: [SPARK-18450][ML] Scala API Change for LSH AND-amplifica...

2017-02-27 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16965 Github isn't handling the merge well, so you might try rebasing --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your pr

[GitHub] spark issue #14273: [SPARK-9140] [ML] Replace TimeTracker by MultiStopwatch

2017-02-27 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/14273 OK apologies @MechCoder for the delay. I guess we can close this issue, and someone else can open up a PR based on yours. --- If your project is set up for it, you can reply to this email and

[GitHub] spark pull request #12762: [SPARK-14891][ML] Add schema validation for ALS

2017-02-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/12762#discussion_r103352114 --- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala --- @@ -242,16 +263,19 @@ class ALSModel private[ml

[GitHub] spark pull request #17090: [Spark-19535][ML] RecommendForAllUsers RecommendF...

2017-02-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17090#discussion_r103354132 --- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala --- @@ -285,6 +286,55 @@ class ALSModel private[ml] ( @Since

[GitHub] spark pull request #17090: [Spark-19535][ML] RecommendForAllUsers RecommendF...

2017-02-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17090#discussion_r103353184 --- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala --- @@ -285,6 +285,43 @@ class ALSModel private[ml] ( @Since

[GitHub] spark pull request #17090: [Spark-19535][ML] RecommendForAllUsers RecommendF...

2017-02-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17090#discussion_r103353799 --- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/TopByKeyAggregator.scala --- @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #17090: [Spark-19535][ML] RecommendForAllUsers RecommendF...

2017-02-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17090#discussion_r103352432 --- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala --- @@ -285,6 +285,43 @@ class ALSModel private[ml] ( @Since

[GitHub] spark pull request #17090: [Spark-19535][ML] RecommendForAllUsers RecommendF...

2017-02-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17090#discussion_r103350750 --- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala --- @@ -285,6 +285,43 @@ class ALSModel private[ml] ( @Since

[GitHub] spark pull request #17090: [Spark-19535][ML] RecommendForAllUsers RecommendF...

2017-02-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17090#discussion_r103351299 --- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala --- @@ -248,18 +248,18 @@ class ALSModel private[ml] ( @Since("

[GitHub] spark pull request #16715: [Spark-18080][ML][PYTHON] Python API & Examples f...

2017-02-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r103357342 --- Diff: python/pyspark/ml/feature.py --- @@ -120,6 +122,196 @@ def getThreshold(self): return self.getOrDefault(self.threshold

[GitHub] spark pull request #17090: [Spark-19535][ML] RecommendForAllUsers RecommendF...

2017-02-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17090#discussion_r103396443 --- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala --- @@ -248,18 +248,18 @@ class ALSModel private[ml] ( @Since("

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-02-28 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17090 @hhbyyh This is different from https://github.com/apache/spark/pull/12574 since it sidesteps the ongoing design discussions about input and output schema. Eventually, I'd like us to pr

[GitHub] spark pull request #17090: [Spark-19535][ML] RecommendForAllUsers RecommendF...

2017-02-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17090#discussion_r103397271 --- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala --- @@ -285,6 +288,57 @@ class ALSModel private[ml] ( @Since

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-02-28 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17090 I'd been following the long discussions about a transform-based solution, but those had not seemed to have converged to a clear design. If you feel they have in your PR, then I'll

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-02-28 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17090 @MLnick Thanks for showing those comparison numbers. If your implementation is faster, then I'm happy going with it. I do wonder if we might hit scalability issues with RDDs which we woul

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-02-28 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17090 @MLnick Thanks *a lot* for the detailed tests! I really appreciate it. In this case, are you OK with the approach in the current PR (pending reviews)? One thing we should confirm is

[GitHub] spark issue #17103: [Minor][Doc] Update GLM doc to include tweedie distribut...

2017-02-28 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17103 LGTM Thanks! Merging with master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16782: [SPARK-19348][PYTHON][WIP] PySpark keyword_only decorato...

2017-02-28 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16782 > it leaves in place the static class variable for all other ML classes that use the wrapper, and those classes continue to use the static class variable. I think this was discus

[GitHub] spark issue #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-28 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15415 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-28 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15415 I'm going to go ahead and merge this after tests to make sure it's in 2.2, but can you please send a follow-up for my last 2 comments? Thanks! --- If your project is set up for i

[GitHub] spark issue #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-28 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15415 No problem, thanks! Could you please create a subtask for docs? Merging with master --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request #16784: [SPARK-19382][ML]:Test sparse vectors in LinearSV...

2017-02-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16784#discussion_r103582526 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala --- @@ -68,13 +77,21 @@ class LinearSVCSuite extends SparkFunSuite

[GitHub] spark pull request #16784: [SPARK-19382][ML]:Test sparse vectors in LinearSV...

2017-02-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16784#discussion_r103582543 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala --- @@ -51,6 +54,12 @@ class LinearSVCSuite extends SparkFunSuite

[GitHub] spark pull request #16784: [SPARK-19382][ML]:Test sparse vectors in LinearSV...

2017-02-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16784#discussion_r103582514 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala --- @@ -68,13 +77,21 @@ class LinearSVCSuite extends SparkFunSuite

[GitHub] spark issue #16883: [SPARK-17498][ML] StringIndexer enhancement for handling...

2017-02-28 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16883 Btw, we're near the time when the 2.2 branch will be cut, and I'd like to get this into 2.2. Let me know if you're busy, and I'd be happy to help finalize the PR. Thanks!

[GitHub] spark pull request #17110: [SPARK-19635][ML] DataFrame-based API for chi squ...

2017-02-28 Thread jkbradley
GitHub user jkbradley opened a pull request: https://github.com/apache/spark/pull/17110 [SPARK-19635][ML] DataFrame-based API for chi square test ## What changes were proposed in this pull request? Wrapper taking and return a DataFrame ## How was this patch tested

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-03-01 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17090 It's a good point about making an implicit decision. We could deprecate these methods in favor of transform-based ones in the future---we have done this in the past---but it does push the

[GitHub] spark pull request #17110: [SPARK-19635][ML] DataFrame-based API for chi squ...

2017-03-03 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17110#discussion_r104220081 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/ChiSquare.scala --- @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #17110: [SPARK-19635][ML] DataFrame-based API for chi squ...

2017-03-03 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17110#discussion_r104220074 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/ChiSquare.scala --- @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #17110: [SPARK-19635][ML] DataFrame-based API for chi squ...

2017-03-03 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17110#discussion_r104220095 --- Diff: mllib/src/test/scala/org/apache/spark/ml/stat/ChiSquareSuite.scala --- @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #17110: [SPARK-19635][ML] DataFrame-based API for chi square tes...

2017-03-03 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17110 Actually, synced with @thunterdb and will update design doc to put everything under a "Statistics" object. I'll wait until https://github.com/apache/spark/pull/17108 gets merged

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-03-03 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17090 LGTM Any other comments before we merge? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #16782: [SPARK-19348][PYTHON] PySpark keyword_only decorator is ...

2017-03-03 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16782 You're right about the test. I'll take a look now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project doe

[GitHub] spark issue #16782: [SPARK-19348][PYTHON] PySpark keyword_only decorator is ...

2017-03-03 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16782 Clever unit test : ) LGTM Merging with master I'll try to backport it to branch-2.1 and branch-2.0 as well. --- If your project is set up for it, you can reply to this

[GitHub] spark issue #16782: [SPARK-19348][PYTHON] PySpark keyword_only decorator is ...

2017-03-03 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16782 Well, it merged with master, but it will need some manual backports. @BryanCutler Would you mind sending one for branch-2.1? I'm ambivalent about 2.0; your call (or anyone who's hit t

[GitHub] spark pull request #17165: [DO NOT MERGE][TESTING] Vince shieh spark 17498

2017-03-04 Thread jkbradley
GitHub user jkbradley opened a pull request: https://github.com/apache/spark/pull/17165 [DO NOT MERGE][TESTING] Vince shieh spark 17498 Temp PR to reproduce Jenkins compilation error You can merge this pull request into a Git repository by running: $ git pull https

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-03-04 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16883#discussion_r104296526 --- Diff: docs/ml-features.md --- @@ -542,12 +543,13 @@ column, we should get the following: "a" gets index `0` because it is the mos

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-03-04 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16883#discussion_r104296075 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -105,7 +125,11 @@ class StringIndexer @Since("

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-03-04 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16883#discussion_r104296045 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -34,8 +36,27 @@ import org.apache.spark.util.collection.OpenHashMap

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-03-04 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16883#discussion_r104296396 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -163,25 +190,28 @@ class StringIndexerModel

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-03-04 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16883#discussion_r104296546 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -71,18 +92,17 @@ class StringIndexer @Since("1.4.0") (

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-03-04 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16883#discussion_r104296156 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -142,18 +166,18 @@ class StringIndexerModel

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-03-04 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16883#discussion_r104296099 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -105,7 +125,11 @@ class StringIndexer @Since("

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-03-04 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16883#discussion_r104296562 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -163,25 +187,28 @@ class StringIndexerModel

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-03-04 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16883#discussion_r104296367 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -163,25 +190,28 @@ class StringIndexerModel

[GitHub] spark pull request #17165: [DO NOT MERGE][TESTING] Vince shieh spark 17498

2017-03-04 Thread jkbradley
Github user jkbradley closed the pull request at: https://github.com/apache/spark/pull/17165 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request #16811: [SPARK-17629][ML] methods to return synonyms dire...

2017-03-05 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16811#discussion_r104330610 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/Word2VecSuite.scala --- @@ -134,13 +134,20 @@ class Word2VecSuite extends SparkFunSuite with

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-03-05 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17090 I'll merge this with master now Thanks @sueann and @MLnick for feedback. I'll prioritize helping with your work on transform, metrics, and tuning for ALS next. --- If your proj

[GitHub] spark pull request #16715: [Spark-18080][ML][PYTHON] Python API & Examples f...

2017-03-05 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r104331119 --- Diff: python/pyspark/ml/feature.py --- @@ -120,6 +122,196 @@ def getThreshold(self): return self.getOrDefault(self.threshold

[GitHub] spark issue #16623: [SPARK-19066][SPARKR][Backport-2.1]:LDA doesn't set opti...

2017-03-05 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16623 Thanks for the comments. I definitely agree with many of your combined statements: * R has not been declared stable. (Though where in the docs is this even stated? I was unable to find

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-03-06 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15435 +1 for moving implementation to traits, as long as the public methods are still Java-friendly. (Methods which are implemented in traits often can't be called from Java.) --- If your proje

[GitHub] spark issue #16784: [SPARK-19382][ML]:Test sparse vectors in LinearSVCSuite

2017-03-06 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16784 Thanks for the updates. This LGTM pending the conflict resolution. Sorry for the delay! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-03-06 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16883#discussion_r104489280 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -142,18 +167,17 @@ class StringIndexerModel

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-03-06 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16883#discussion_r104489264 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -34,8 +36,27 @@ import org.apache.spark.util.collection.OpenHashMap

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-03-06 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16883#discussion_r104489270 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -71,9 +92,8 @@ class StringIndexer @Since("1.4.0") (

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-03-06 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16883#discussion_r104489244 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -34,8 +36,27 @@ import org.apache.spark.util.collection.OpenHashMap

[GitHub] spark issue #16784: [SPARK-19382][ML]:Test sparse vectors in LinearSVCSuite

2017-03-06 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16784 LGTM Thanks @wangmiao1981 ! Merging with master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-03-06 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17090 @MLnick OK I think I misunderstood some of your comments above then. I see the proposal in SPARK-14409 differs from this PR, so I agree it'd be nice to resolve it. We can make changes to

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-03-07 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17090 Thanks @MLnick for the explanation. This is what I'd understood from your similar description on the JIRA, but definitely more in-depth. (It might be good to copy to JIRA, or even a desig

[GitHub] spark issue #16883: [SPARK-17498][ML] StringIndexer enhancement for handling...

2017-03-07 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16883 LGTM Merging with master Thanks a lot! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #16883: [SPARK-17498][ML] StringIndexer enhancement for handling...

2017-03-07 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16883 Btw, are you interested in updating the Python API too? https://issues.apache.org/jira/browse/SPARK-19852 --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #16512: [SPARK-18335][SPARKR] createDataFrame to support numPart...

2017-03-07 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16512 It would definitely be considered a new API, though I agree with you that it's probably safe. That said, I'm not a fan of such changes in patch versions unless they really are

[GitHub] spark issue #16739: [SPARK-19399][SPARKR] Add R coalesce API for DataFrame a...

2017-03-07 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16739 I've commented elsewhere, but wanted to here just to make more people aware: Let's refrain from backporting new APIs into patch versions unless they are really critical. We do n

[GitHub] spark issue #16883: [SPARK-17498][ML] StringIndexer enhancement for handling...

2017-03-07 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16883 Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark issue #16811: [SPARK-17629][ML] methods to return synonyms directly

2017-03-07 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16811 LGTM Merging with master Thank you! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #16811: [SPARK-17629][ML] methods to return synonyms directly

2017-03-07 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16811 Thanks! I made a follow-up JIRA for updating the Python API: https://issues.apache.org/jira/browse/SPARK-19866 --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #17193: [SPARK-19348][PYTHON] PySpark keyword_only decorator is ...

2017-03-07 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17193 LGTM Merging with branch-2.1 Thank you! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #17195: [SPARK-19348][PYTHON] PySpark keyword_only decorator is ...

2017-03-07 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17195 LGTM Merging with branch-2.0 Thank you again! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #17215: [MINOR][ML] Improve MLWriter overwrite error mess...

2017-03-08 Thread jkbradley
GitHub user jkbradley opened a pull request: https://github.com/apache/spark/pull/17215 [MINOR][ML] Improve MLWriter overwrite error message ## What changes were proposed in this pull request? Give proper syntax for Java and Python in addition to Scala. ## How was

[GitHub] spark issue #17108: [SPARK-19636][ML] Feature parity for correlation statist...

2017-03-08 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17108 Given further thought, I'd prefer we stick to the API specified in the design doc, with a Correlations object instead of a generic Statistics object. In the future, we may want optional P

[GitHub] spark issue #17110: [SPARK-19635][ML] DataFrame-based API for chi square tes...

2017-03-08 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17110 I just reversed my opinion about a shared "Statistics" object. See https://github.com/apache/spark/pull/17108#issuecomment-285200613 for details. I pushed an update per y

[GitHub] spark issue #16002: [SPARK-18341][ML] Eliminate use of SingularMatrixExcepti...

2017-03-08 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16002 @yanboliang Sorry for missing earlier discussion. I'm OK with declaring defeat here, though I still disagree about using exceptions. I agree that passing an obscure error code up is not

[GitHub] spark issue #17218: [SPARK-19281][WIP][PYTHON][ML] spark.ml Python API for F...

2017-03-13 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17218 Thanks for the PR! I'll wait until this isn't "WIP" to review it thoroughly, but I'll make two comments now: * The params should not be added to shared.py since th

[GitHub] spark pull request #17130: [SPARK-19791] [ML] Add doc and example for fpgrow...

2017-03-13 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17130#discussion_r105702725 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -56,8 +56,8 @@ private[fpm] trait FPGrowthParams extends Params with

<    4   5   6   7   8   9   10   11   12   13   >