[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-03-04 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16883#discussion_r104296045 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -34,8 +36,27 @@ import org.apache.spark.util.collection.OpenHashMap

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-03-04 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16883#discussion_r104296396 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -163,25 +190,28 @@ class StringIndexerModel

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-03-04 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16883#discussion_r104296526 --- Diff: docs/ml-features.md --- @@ -542,12 +543,13 @@ column, we should get the following: "a" gets index `0` because it is the mos

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-03-04 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16883#discussion_r104296075 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -105,7 +125,11 @@ class StringIndexer @Since("

[GitHub] spark pull request #17165: [DO NOT MERGE][TESTING] Vince shieh spark 17498

2017-03-04 Thread jkbradley
GitHub user jkbradley opened a pull request: https://github.com/apache/spark/pull/17165 [DO NOT MERGE][TESTING] Vince shieh spark 17498 Temp PR to reproduce Jenkins compilation error You can merge this pull request into a Git repository by running: $ git pull https

[GitHub] spark issue #16782: [SPARK-19348][PYTHON] PySpark keyword_only decorator is ...

2017-03-03 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16782 Well, it merged with master, but it will need some manual backports. @BryanCutler Would you mind sending one for branch-2.1? I'm ambivalent about 2.0; your call (or anyone who's hit this on 2.0

spark git commit: [SPARK-19348][PYTHON] PySpark keyword_only decorator is not thread-safe

2017-03-03 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master 2a7921a81 -> 44281ca81 [SPARK-19348][PYTHON] PySpark keyword_only decorator is not thread-safe ## What changes were proposed in this pull request? The `keyword_only` decorator in PySpark is not thread-safe. It writes kwargs to a static

[GitHub] spark issue #16782: [SPARK-19348][PYTHON] PySpark keyword_only decorator is ...

2017-03-03 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16782 Clever unit test : ) LGTM Merging with master I'll try to backport it to branch-2.1 and branch-2.0 as well. --- If your project is set up for it, you can reply to this email

[GitHub] spark issue #16782: [SPARK-19348][PYTHON] PySpark keyword_only decorator is ...

2017-03-03 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16782 You're right about the test. I'll take a look now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-03-03 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17090 LGTM Any other comments before we merge? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #17110: [SPARK-19635][ML] DataFrame-based API for chi square tes...

2017-03-03 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17110 Actually, synced with @thunterdb and will update design doc to put everything under a "Statistics" object. I'll wait until https://github.com/apache/spark/pull/17108 gets merged. -

[GitHub] spark pull request #17110: [SPARK-19635][ML] DataFrame-based API for chi squ...

2017-03-03 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17110#discussion_r104220095 --- Diff: mllib/src/test/scala/org/apache/spark/ml/stat/ChiSquareSuite.scala --- @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #17110: [SPARK-19635][ML] DataFrame-based API for chi squ...

2017-03-03 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17110#discussion_r104220074 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/ChiSquare.scala --- @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #17110: [SPARK-19635][ML] DataFrame-based API for chi squ...

2017-03-03 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17110#discussion_r104220081 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/ChiSquare.scala --- @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-03-01 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17090 It's a good point about making an implicit decision. We could deprecate these methods in favor of transform-based ones in the future---we have done this in the past---but it does push the long

[GitHub] spark pull request #17110: [SPARK-19635][ML] DataFrame-based API for chi squ...

2017-02-28 Thread jkbradley
GitHub user jkbradley opened a pull request: https://github.com/apache/spark/pull/17110 [SPARK-19635][ML] DataFrame-based API for chi square test ## What changes were proposed in this pull request? Wrapper taking and return a DataFrame ## How was this patch tested

[GitHub] spark issue #16883: [SPARK-17498][ML] StringIndexer enhancement for handling...

2017-02-28 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16883 Btw, we're near the time when the 2.2 branch will be cut, and I'd like to get this into 2.2. Let me know if you're busy, and I'd be happy to help finalize the PR. Thanks! --- If your project

[GitHub] spark pull request #16784: [SPARK-19382][ML]:Test sparse vectors in LinearSV...

2017-02-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16784#discussion_r103582514 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala --- @@ -68,13 +77,21 @@ class LinearSVCSuite extends SparkFunSuite

[GitHub] spark pull request #16784: [SPARK-19382][ML]:Test sparse vectors in LinearSV...

2017-02-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16784#discussion_r103582543 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala --- @@ -51,6 +54,12 @@ class LinearSVCSuite extends SparkFunSuite

[GitHub] spark pull request #16784: [SPARK-19382][ML]:Test sparse vectors in LinearSV...

2017-02-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16784#discussion_r103582526 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala --- @@ -68,13 +77,21 @@ class LinearSVCSuite extends SparkFunSuite

spark git commit: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-28 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master ca3864d6e -> 0fe8020f3 [SPARK-14503][ML] spark.ml API for FPGrowth ## What changes were proposed in this pull request? jira: https://issues.apache.org/jira/browse/SPARK-14503 Function parity: Add FPGrowth and AssociationRules to ML.

[GitHub] spark issue #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-28 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15415 No problem, thanks! Could you please create a subtask for docs? Merging with master --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-28 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15415 I'm going to go ahead and merge this after tests to make sure it's in 2.2, but can you please send a follow-up for my last 2 comments? Thanks! --- If your project is set up for it, you can

[GitHub] spark issue #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-28 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15415 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #16782: [SPARK-19348][PYTHON][WIP] PySpark keyword_only decorato...

2017-02-28 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16782 > it leaves in place the static class variable for all other ML classes that use the wrapper, and those classes continue to use the static class variable. I think this was discus

spark git commit: [MINOR][DOC] Update GLM doc to include tweedie distribution

2017-02-28 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master 7e5359be5 -> d743ea4c7 [MINOR][DOC] Update GLM doc to include tweedie distribution Update GLM documentation to include the Tweedie distribution. #16344 jkbradley yanboliang Author: actuaryzhang <actuaryzhan...@gmail.com> Clos

[GitHub] spark issue #17103: [Minor][Doc] Update GLM doc to include tweedie distribut...

2017-02-28 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17103 LGTM Thanks! Merging with master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-02-28 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17090 @MLnick Thanks *a lot* for the detailed tests! I really appreciate it. In this case, are you OK with the approach in the current PR (pending reviews)? One thing we should confirm

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-02-28 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17090 @MLnick Thanks for showing those comparison numbers. If your implementation is faster, then I'm happy going with it. I do wonder if we might hit scalability issues with RDDs which we would

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-02-28 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17090 I'd been following the long discussions about a transform-based solution, but those had not seemed to have converged to a clear design. If you feel they have in your PR, then I'll spend some

[GitHub] spark pull request #17090: [Spark-19535][ML] RecommendForAllUsers RecommendF...

2017-02-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17090#discussion_r103397271 --- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala --- @@ -285,6 +288,57 @@ class ALSModel private[ml] ( @Since

[GitHub] spark issue #17090: [Spark-19535][ML] RecommendForAllUsers RecommendForAllIt...

2017-02-28 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17090 @hhbyyh This is different from https://github.com/apache/spark/pull/12574 since it sidesteps the ongoing design discussions about input and output schema. Eventually, I'd like us to proceed

[GitHub] spark pull request #17090: [Spark-19535][ML] RecommendForAllUsers RecommendF...

2017-02-28 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17090#discussion_r103396443 --- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala --- @@ -248,18 +248,18 @@ class ALSModel private[ml] ( @Since("

[GitHub] spark pull request #16715: [Spark-18080][ML][PYTHON] Python API & Examples f...

2017-02-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16715#discussion_r103357342 --- Diff: python/pyspark/ml/feature.py --- @@ -120,6 +122,196 @@ def getThreshold(self): return self.getOrDefault(self.threshold

[GitHub] spark pull request #17090: [Spark-19535][ML] RecommendForAllUsers RecommendF...

2017-02-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17090#discussion_r103351299 --- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala --- @@ -248,18 +248,18 @@ class ALSModel private[ml] ( @Since("

[GitHub] spark pull request #17090: [Spark-19535][ML] RecommendForAllUsers RecommendF...

2017-02-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17090#discussion_r103352432 --- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala --- @@ -285,6 +285,43 @@ class ALSModel private[ml] ( @Since

[GitHub] spark pull request #17090: [Spark-19535][ML] RecommendForAllUsers RecommendF...

2017-02-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17090#discussion_r103350750 --- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala --- @@ -285,6 +285,43 @@ class ALSModel private[ml] ( @Since

[GitHub] spark pull request #17090: [Spark-19535][ML] RecommendForAllUsers RecommendF...

2017-02-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17090#discussion_r103353799 --- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/TopByKeyAggregator.scala --- @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #17090: [Spark-19535][ML] RecommendForAllUsers RecommendF...

2017-02-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17090#discussion_r103354132 --- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala --- @@ -285,6 +286,55 @@ class ALSModel private[ml] ( @Since

[GitHub] spark pull request #17090: [Spark-19535][ML] RecommendForAllUsers RecommendF...

2017-02-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/17090#discussion_r103353184 --- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala --- @@ -285,6 +285,43 @@ class ALSModel private[ml] ( @Since

[GitHub] spark pull request #12762: [SPARK-14891][ML] Add schema validation for ALS

2017-02-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/12762#discussion_r103352114 --- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala --- @@ -242,16 +263,19 @@ class ALSModel private[ml

[GitHub] spark issue #14273: [SPARK-9140] [ML] Replace TimeTracker by MultiStopwatch

2017-02-27 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/14273 OK apologies @MechCoder for the delay. I guess we can close this issue, and someone else can open up a PR based on yours. --- If your project is set up for it, you can reply to this email

[GitHub] spark issue #16965: [SPARK-18450][ML] Scala API Change for LSH AND-amplifica...

2017-02-27 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16965 Github isn't handling the merge well, so you might try rebasing --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #14273: [SPARK-9140] [ML] Replace TimeTracker by MultiStopwatch

2017-02-27 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/14273 Sorry about the delay here. Do you still have time to work on this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request #16811: [SPARK-17629][ML] methods to return synonyms dire...

2017-02-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16811#discussion_r103338261 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/Word2VecSuite.scala --- @@ -144,6 +144,31 @@ class Word2VecSuite extends SparkFunSuite

[GitHub] spark pull request #16811: [SPARK-17629][ML] methods to return synonyms dire...

2017-02-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16811#discussion_r103338146 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/Word2VecSuite.scala --- @@ -144,6 +144,31 @@ class Word2VecSuite extends SparkFunSuite

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-02-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16883#discussion_r103325403 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -34,8 +36,25 @@ import org.apache.spark.util.collection.OpenHashMap

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-02-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16883#discussion_r103330093 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -34,8 +36,25 @@ import org.apache.spark.util.collection.OpenHashMap

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-02-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16883#discussion_r103332623 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/StringIndexerSuite.scala --- @@ -75,22 +75,32 @@ class StringIndexerSuite intercept

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-02-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16883#discussion_r103332929 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -34,8 +36,25 @@ import org.apache.spark.util.collection.OpenHashMap

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-02-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16883#discussion_r103325211 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -17,14 +17,16 @@ package org.apache.spark.ml.feature

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-02-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16883#discussion_r103331212 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -163,25 +190,28 @@ class StringIndexerModel

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-02-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16883#discussion_r103330268 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -71,18 +90,22 @@ class StringIndexer @Since("1.4.0") (

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-02-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16883#discussion_r103330303 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -71,18 +90,22 @@ class StringIndexer @Since("1.4.0") (

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-02-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16883#discussion_r103330242 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -71,18 +90,22 @@ class StringIndexer @Since("1.4.0") (

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-02-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16883#discussion_r103331444 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala --- @@ -163,25 +190,28 @@ class StringIndexerModel

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-02-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16883#discussion_r103332885 --- Diff: docs/ml-features.md --- @@ -576,7 +578,22 @@ will be generated: 2 | c| 1.0 -Notice that the row containing &q

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-02-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16883#discussion_r103332764 --- Diff: docs/ml-features.md --- @@ -502,7 +502,7 @@ for more details on the API. ## StringIndexer `StringIndexer` encodes a string

[GitHub] spark issue #16883: [SPARK-17498][ML] StringIndexer enhancement for handling...

2017-02-27 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16883 I'll take a look now, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

2017-02-27 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15770 Yep, that's correct. Everyone, please let me know if you disagree. Also, if we do go with Option 2 above, then the input schema could be a few possible things: * list of (neighbor ID

[GitHub] spark issue #17048: [SPARK-14772][PYTHON][ML] Fixed Params.copy method to ma...

2017-02-27 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17048 Can you please close this manually? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #16782: [SPARK-19348][PYTHON][WIP] PySpark keyword_only decorato...

2017-02-27 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16782 I'm OK with the current solution, though if it's easy to check using ```inspection``` then that seems nice to do. If there are cases in which the wrapper is still not thread-safe

[GitHub] spark pull request #16784: [SPARK-19382][ML]:Test sparse vectors in LinearSV...

2017-02-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16784#discussion_r103278706 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala --- @@ -220,12 +246,13 @@ object LinearSVCSuite

[GitHub] spark pull request #16784: [SPARK-19382][ML]:Test sparse vectors in LinearSV...

2017-02-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16784#discussion_r103278715 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala --- @@ -234,7 +261,12 @@ object LinearSVCSuite { val yD

[GitHub] spark pull request #16784: [SPARK-19382][ML]:Test sparse vectors in LinearSV...

2017-02-27 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16784#discussion_r103278729 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala --- @@ -203,6 +227,8 @@ class LinearSVCSuite extends SparkFunSuite

[GitHub] spark issue #16782: [SPARK-19348][PYTHON][WIP] PySpark keyword_only decorato...

2017-02-26 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16782 Thanks @BryanCutler for the patch! The fix looks reasonable to me, but let me try to check with @davies to confirm. If this is the right approach, then I think we should update the other

spark git commit: [SPARK-14772][PYTHON][ML] Fixed Params.copy method to match Scala implementation

2017-02-25 Thread jkbradley
Repository: spark Updated Branches: refs/heads/branch-2.1 97866e198 -> 20a432951 [SPARK-14772][PYTHON][ML] Fixed Params.copy method to match Scala implementation ## What changes were proposed in this pull request? Fixed the PySpark Params.copy method to behave like the Scala implementation.

[GitHub] spark issue #17048: [SPARK-14772][PYTHON][ML] Fixed Params.copy method to ma...

2017-02-25 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17048 LGTM Merging with branch-2.1 Thank you! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

2017-02-25 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15770 Sorry for my absence from recent conversation! I agree there is no clear answer for handling input and output schema. Some options: * Option 1: same as RDD/GraphX-based API

[GitHub] spark issue #17069: [MINOR][ML][DOC] Document default value for GeneralizedL...

2017-02-25 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17069 CC @actuaryzhang @yanboliang Just noticed that the default is missing in the Scaladoc. (Thanks btw for adding Tweedie support!) --- If your project is set up for it, you can reply

[GitHub] spark pull request #17069: [MINOR][ML][DOC] Document default value for Gener...

2017-02-25 Thread jkbradley
GitHub user jkbradley opened a pull request: https://github.com/apache/spark/pull/17069 [MINOR][ML][DOC] Document default value for GeneralizedLinearRegression.linkPower Add Scaladoc for GeneralizedLinearRegression.linkPower default value Follow-up to https://github.com

[GitHub] spark pull request #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r103068598 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -0,0 +1,339 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r103068619 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -0,0 +1,339 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark issue #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-24 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15415 I don't think we need to support the default prediction (for empty/null inputs) now. I agree we could use an inputer or add something as an option later on. Will take a final look now

[GitHub] spark pull request #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-24 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r103050909 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -0,0 +1,347 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark issue #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-24 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15415 I agree that, if the set of rules is small (1-2 GB max), then collecting and broadcasting it is best. But for larger sets of rules, we'd have to keep it distributed. I'm very surprised

spark git commit: [SPARK-14772][PYTHON][ML] Fixed Params.copy method to match Scala implementation

2017-02-23 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master d02762457 -> 2f69e3f60 [SPARK-14772][PYTHON][ML] Fixed Params.copy method to match Scala implementation ## What changes were proposed in this pull request? Fixed the PySpark Params.copy method to behave like the Scala implementation. The

[GitHub] spark issue #16772: [SPARK-14772][PYTHON][ML] Fixed Params.copy method to ma...

2017-02-23 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16772 OK merging now. @BryanCutler do let me know if you don't have time to send a backport--thanks! --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #12888: [SPARK-14772][ML,PySpark]Python ML Params.copy treats ui...

2017-02-23 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/12888 @hujy Thank you for sending this PR, and apologies for not seeing it earlier. Since the other PR for this JIRA is ready to merge, could you please close this issue? Thanks again! --- If your

[GitHub] spark issue #16772: [SPARK-14772][PYTHON][ML] Fixed Params.copy method to ma...

2017-02-23 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16772 well...will merge after new tests --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16772: [SPARK-14772][PYTHON][ML] Fixed Params.copy method to ma...

2017-02-23 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16772 jenkins test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16772: [SPARK-14772][PYTHON][ML] Fixed Params.copy method to ma...

2017-02-23 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16772 LGTM Merging with master @BryanCutler would you mind sending a backport PR against branch-2.1 to run Jenkins tests? Thank you! --- If your project is set up for it, you can

[GitHub] spark pull request #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-23 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r102856964 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -0,0 +1,346 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-23 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r102845724 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -0,0 +1,346 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-23 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r102845588 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -0,0 +1,346 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-23 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r102792088 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -0,0 +1,346 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-23 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r102792306 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -0,0 +1,346 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-23 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r102792331 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -0,0 +1,346 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-22 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r102598489 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -0,0 +1,341 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-22 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r102598758 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -0,0 +1,341 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-22 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r102535175 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -0,0 +1,341 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-22 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r102594866 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -0,0 +1,341 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-22 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r102535113 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -0,0 +1,341 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-22 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r102599019 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -0,0 +1,341 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-22 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r102595600 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -0,0 +1,341 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-22 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r102535118 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -0,0 +1,341 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-22 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r102535156 --- Diff: mllib/src/test/scala/org/apache/spark/ml/fpm/FPGrowthSuite.scala --- @@ -0,0 +1,130 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-22 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r102598955 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -0,0 +1,341 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-22 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r102598078 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -0,0 +1,341 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-22 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r102598022 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -0,0 +1,341 @@ +/* + * Licensed to the Apache Software Foundation

<    8   9   10   11   12   13   14   15   16   17   >