[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

2015-08-06 Thread yinxusen
Github user yinxusen closed the pull request at: https://github.com/apache/spark/pull/5731 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is en

[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

2015-08-05 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/5731#issuecomment-128139349 @yinxusen In the interests of time, I created a new PR based on this one: [https://github.com/apache/spark/pull/7972] You will still be the primary author of it. If

[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

2015-08-04 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/5731#issuecomment-127806589 @yinxusen Would you mind if I sent a PR to you (which will update this PR)? We'd like to squeeze this into 1.5. --- If your project is set up for it, you can reply t

[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

2015-08-02 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/5731#issuecomment-127091229 @yinxusen I think switching to disallowing any overlap in indices and names will simplify both the API and the implementation. --- If your project is set up for it, y

[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

2015-08-02 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5731#discussion_r36050372 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorSlicer.scala --- @@ -0,0 +1,153 @@ +/* + * Licensed to the Apache Software Foundati

[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

2015-08-02 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5731#discussion_r36050377 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorSlicer.scala --- @@ -0,0 +1,153 @@ +/* + * Licensed to the Apache Software Foundati

[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

2015-08-02 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5731#discussion_r36050382 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorSlicer.scala --- @@ -0,0 +1,153 @@ +/* + * Licensed to the Apache Software Foundati

[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

2015-08-02 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5731#discussion_r36050378 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorSlicer.scala --- @@ -0,0 +1,153 @@ +/* + * Licensed to the Apache Software Foundati

[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

2015-08-02 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5731#discussion_r36050373 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorSlicer.scala --- @@ -0,0 +1,153 @@ +/* + * Licensed to the Apache Software Foundati

[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

2015-08-02 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5731#discussion_r36050374 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorSlicer.scala --- @@ -0,0 +1,153 @@ +/* + * Licensed to the Apache Software Foundati

[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

2015-08-02 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5731#discussion_r36050379 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorSlicer.scala --- @@ -0,0 +1,153 @@ +/* + * Licensed to the Apache Software Foundati

[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

2015-08-02 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5731#discussion_r36050375 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorSlicer.scala --- @@ -0,0 +1,153 @@ +/* + * Licensed to the Apache Software Foundati

[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

2015-08-02 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5731#discussion_r36050383 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorSlicer.scala --- @@ -0,0 +1,153 @@ +/* + * Licensed to the Apache Software Foundati

[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

2015-08-02 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5731#discussion_r36050381 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorSlicer.scala --- @@ -0,0 +1,153 @@ +/* + * Licensed to the Apache Software Foundati

[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

2015-08-02 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/5731#issuecomment-127059025 @yinxusen Yeah, good point, what I said last is too complex. I'll take a look now. --- If your project is set up for it, you can reply to this email and have your re

[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

2015-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5731#issuecomment-126998721 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

2015-08-02 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5731#issuecomment-126998698 [Test build #39412 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/39412/console) for PR 5731 at commit [`98c6939`](https://github.

[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

2015-08-02 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5731#issuecomment-126997005 [Test build #39412 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/39412/consoleFull) for PR 5731 at commit [`98c6939`](https://gith

[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

2015-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5731#issuecomment-126996943 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not h

[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

2015-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5731#issuecomment-126996946 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

2015-08-01 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5731#issuecomment-126987226 [Test build #39401 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/39401/console) for PR 5731 at commit [`ecbf2d3`](https://github.

[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

2015-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5731#issuecomment-126987227 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

2015-08-01 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5731#issuecomment-126987192 [Test build #39401 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/39401/consoleFull) for PR 5731 at commit [`ecbf2d3`](https://gith

[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

2015-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5731#issuecomment-126987125 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not h

[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

2015-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5731#issuecomment-126987127 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

2015-08-01 Thread yinxusen
Github user yinxusen commented on the pull request: https://github.com/apache/spark/pull/5731#issuecomment-126987117 @jkbradley How about we sticking to the prior discussion? I think users do not want to repeat features. --- If your project is set up for it, you can reply to this ema

[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

2015-08-01 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/5731#issuecomment-126948155 @yinxusen By the way, it would be great to squeeze this into this release. Will you be able to send an update soon? Thanks! --- If your project is set up for it, you

[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

2015-08-01 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/5731#issuecomment-126947540 How about this: * We use the ordering specified by the user, where we put features specified by index before features specified by name. * This will be a well

[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

2015-08-01 Thread yinxusen
Github user yinxusen commented on the pull request: https://github.com/apache/spark/pull/5731#issuecomment-126877346 @jkbradley There is already an IntArrayParam in the sharedParam. Besides, there are some issues to talk: - Should we consider the scenario that some att

[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

2015-07-31 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/5731#issuecomment-126747089 OK thanks! Note: "IntArrayParam" may not exist yet in params.scala, but please add it based on DoubleArrayParam as needed. --- If your project is set up for it, you

[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

2015-07-31 Thread yinxusen
Github user yinxusen commented on the pull request: https://github.com/apache/spark/pull/5731#issuecomment-126688991 @jkbradley Agreed, blending these two selected indices is easy to use. I'll fix it soon. --- If your project is set up for it, you can reply to this email and have you

[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

2015-07-30 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/5731#issuecomment-126534353 Here are some initial thoughts: We should definitely permit users to specify features with indices and names. Supporting both within the same type makes the API prett

[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

2015-07-30 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5731#discussion_r35938626 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorSlicer.scala --- @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software Foundati

[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

2015-07-30 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5731#discussion_r35938540 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorSlicer.scala --- @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software Foundati

[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

2015-07-30 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5731#discussion_r35938539 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorSlicer.scala --- @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software Foundati

[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

2015-07-30 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5731#issuecomment-126533119 [Test build #39130 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/39130/console) for PR 5731 at commit [`fd154d7`](https://github.

[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

2015-07-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5731#issuecomment-126533121 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

2015-07-30 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5731#issuecomment-126533047 [Test build #172 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SlowSparkPullRequestBuilder/172/console) for PR 5731 at commit [`fd154d7`](https://github.

[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

2015-07-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5731#issuecomment-126533050 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

2015-07-30 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5731#issuecomment-126532937 [Test build #39130 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/39130/consoleFull) for PR 5731 at commit [`fd154d7`](https://gith

[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

2015-07-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5731#issuecomment-126532847 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

2015-07-30 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5731#issuecomment-126532877 [Test build #172 has started](https://amplab.cs.berkeley.edu/jenkins/job/SlowSparkPullRequestBuilder/172/consoleFull) for PR 5731 at commit [`fd154d7`](https://gith

[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

2015-07-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5731#issuecomment-126532835 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not h

[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

2015-07-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5731#issuecomment-126532520 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not h

[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

2015-07-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5731#issuecomment-126532524 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

2015-07-30 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/5731#issuecomment-126532438 @yinxusen Apologies for the long wait, but I'm hoping to get this in for 1.5. I'll make a pass now. But if you are too busy, I'd be happy to help update the PR as ne

[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

2015-07-30 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/5731#issuecomment-126532361 Jenkins test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

2015-04-27 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5731#issuecomment-96906037 [Test build #31104 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31104/consoleFull) for PR 5731 at commit [`fd154d7`](https://gith

[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

2015-04-27 Thread yinxusen
Github user yinxusen commented on a diff in the pull request: https://github.com/apache/spark/pull/5731#discussion_r29211169 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorSlicer.scala --- @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software Foundatio

[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

2015-04-27 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5731#issuecomment-96893198 [Test build #31104 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31104/consoleFull) for PR 5731 at commit [`fd154d7`](https://githu

[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

2015-04-27 Thread yinxusen
Github user yinxusen commented on a diff in the pull request: https://github.com/apache/spark/pull/5731#discussion_r29211027 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/VectorSlicerSuite.scala --- @@ -0,0 +1,89 @@ +/* + * Licensed to the Apache Software Found

[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

2015-04-27 Thread yinxusen
Github user yinxusen commented on a diff in the pull request: https://github.com/apache/spark/pull/5731#discussion_r29211014 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/VectorSlicerSuite.scala --- @@ -0,0 +1,89 @@ +/* + * Licensed to the Apache Software Found

[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

2015-04-27 Thread yinxusen
GitHub user yinxusen opened a pull request: https://github.com/apache/spark/pull/5731 [SPARK-5895][ML] add vector slicer JIRA issue [here](https://issues.apache.org/jira/browse/SPARK-5895). I have some thoughts of `AttributeGroup`: 1. End-user is hard to add `Attrib