[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-29 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r15534415 --- Diff: core/src/main/scala/org/apache/spark/util/random/SamplingUtils.scala --- @@ -88,14 +91,73 @@ private[spark] object SamplingUtils { */

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-29 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1025#issuecomment-50521377 QA tests have started for PR 1025. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17368/consoleFull ---

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-29 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1025#issuecomment-50527473 QA results for PR 1025:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-29 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1025#issuecomment-50528615 LGTM. Merged into master. Thanks!! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-29 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1025 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-28 Thread dorx
GitHub user dorx reopened a pull request: https://github.com/apache/spark/pull/1025 [SPARK-2082] stratified sampling in PairRDDFunctions that guarantees exact sample size Implemented stratified sampling that guarantees exact sample size using ScaRSR with two passes over the RDD

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-28 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1025#issuecomment-50379743 QA tests have started for PR 1025. This patch DID NOT merge cleanly! brView progress:

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-28 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1025#issuecomment-50380482 QA tests have started for PR 1025. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17296/consoleFull ---

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-28 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1025#issuecomment-50381620 QA results for PR 1025:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-28 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1025#issuecomment-50383895 QA tests have started for PR 1025. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17300/consoleFull ---

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-28 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1025#issuecomment-50388808 QA results for PR 1025:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-28 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1025#issuecomment-50393886 QA results for PR 1025:br- This patch FAILED unit tests.brbrFor more information see test

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-28 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1025#issuecomment-50409833 QA tests have started for PR 1025. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17307/consoleFull ---

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-28 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1025#issuecomment-50414198 QA results for PR 1025:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-27 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1025#issuecomment-50290028 @dorx I removed commons-math3 from dependencies, separated `sampleByKey` and `sampleByKeyExact`, and corrected the math in waitlisting in sampling with replacement.

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-25 Thread dorx
Github user dorx closed the pull request at: https://github.com/apache/spark/pull/1025 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1025#issuecomment-50065418 QA tests have started for PR 1025. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17130/consoleFull ---

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1025#issuecomment-50070185 QA results for PR 1025:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds the following public classes (experimental):brcase class

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-24 Thread dorx
Github user dorx commented on the pull request: https://github.com/apache/spark/pull/1025#issuecomment-50071767 Looks like there's some API changes from Xiangrui's updates. @mateiz @pwendell --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-24 Thread dorx
Github user dorx commented on the pull request: https://github.com/apache/spark/pull/1025#issuecomment-50073867 Also, seems like there wasn't a single line of code preserved from before the updates. We should probably close this PR and let Xiangrui submit his version in a separate PR

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-24 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1025#issuecomment-50101767 Sorry, how was the API changed, was it making `sampleByKeyExact` a separate method and making it experimental? That actually seems okay to me, the algorithm there is

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-24 Thread falaki
Github user falaki commented on the pull request: https://github.com/apache/spark/pull/1025#issuecomment-50104259 This is the first place we introduce 'exact' to our API. We already have 'approx' in function names. I think having both of them is confusing to users. --- If your

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-24 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1025#issuecomment-50109108 Well, the other sample functions are already approximate anyway. I kind of like this here because it conveys that it's more expensive. The other thing is that if we want

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-14 Thread dorx
Github user dorx commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r14906065 --- Diff: pom.xml --- @@ -257,6 +257,11 @@ version1.5/version /dependency dependency +

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-14 Thread dorx
Github user dorx commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r14906349 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -195,6 +193,37 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)]) }

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-14 Thread dorx
Github user dorx commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r14906412 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -195,6 +193,37 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)]) }

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-14 Thread dorx
Github user dorx commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r14906680 --- Diff: core/src/main/scala/org/apache/spark/util/random/StratifiedSampler.scala --- @@ -0,0 +1,311 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-14 Thread dorx
Github user dorx commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r14906754 --- Diff: core/src/main/scala/org/apache/spark/util/random/StratifiedSampler.scala --- @@ -0,0 +1,311 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-14 Thread dorx
Github user dorx commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r14906825 --- Diff: core/src/main/scala/org/apache/spark/util/random/StratifiedSampler.scala --- @@ -0,0 +1,311 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-14 Thread dorx
Github user dorx commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r14906919 --- Diff: core/src/main/scala/org/apache/spark/util/random/StratifiedSampler.scala --- @@ -0,0 +1,311 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-14 Thread dorx
Github user dorx commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r14907155 --- Diff: core/src/main/scala/org/apache/spark/util/random/StratifiedSampler.scala --- @@ -0,0 +1,311 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-14 Thread dorx
Github user dorx commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r14907202 --- Diff: core/src/main/scala/org/apache/spark/util/random/StratifiedSampler.scala --- @@ -0,0 +1,311 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-14 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r14907335 --- Diff: core/src/main/scala/org/apache/spark/util/random/StratifiedSampler.scala --- @@ -0,0 +1,311 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-14 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r14907358 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -195,6 +193,37 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)]) }

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-14 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r14907544 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -195,6 +193,37 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)]) }

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-14 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r14907579 --- Diff: core/src/main/scala/org/apache/spark/util/random/StratifiedSampler.scala --- @@ -0,0 +1,311 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-14 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r14907639 --- Diff: core/src/main/scala/org/apache/spark/util/random/StratifiedSampler.scala --- @@ -0,0 +1,311 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-14 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r14907668 --- Diff: core/src/main/scala/org/apache/spark/util/random/StratifiedSampler.scala --- @@ -0,0 +1,311 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-14 Thread dorx
Github user dorx commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r14907670 --- Diff: core/src/main/scala/org/apache/spark/util/random/StratifiedSampler.scala --- @@ -0,0 +1,311 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-14 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r14907687 --- Diff: core/src/main/scala/org/apache/spark/util/random/StratifiedSampler.scala --- @@ -0,0 +1,311 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-14 Thread dorx
Github user dorx commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r14907870 --- Diff: core/src/main/scala/org/apache/spark/util/random/StratifiedSampler.scala --- @@ -0,0 +1,311 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-14 Thread dorx
Github user dorx commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r14907896 --- Diff: core/src/main/scala/org/apache/spark/util/random/StratifiedSampler.scala --- @@ -0,0 +1,311 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-14 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1025#issuecomment-48985401 QA tests have started for PR 1025. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16650/consoleFull ---

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-14 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1025#issuecomment-48989787 QA results for PR 1025:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread dorx
Github user dorx commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r14672589 --- Diff: core/src/main/scala/org/apache/spark/util/random/StratifiedSampler.scala --- @@ -0,0 +1,335 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1025#issuecomment-48384891 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1025#issuecomment-48384908 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1025#issuecomment-48386184 Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16414/ --- If your project is set up for it, you can

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread dorx
Github user dorx commented on the pull request: https://github.com/apache/spark/pull/1025#issuecomment-48386518 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1025#issuecomment-48386790 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1025#issuecomment-48386807 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1025#issuecomment-48388111 Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16416/ --- If your project is set up for it, you can

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1025#issuecomment-48388110 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1025#issuecomment-48414125 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1025#issuecomment-48414132 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread falaki
Github user falaki commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r14688121 --- Diff: core/src/main/scala/org/apache/spark/util/random/SamplingUtils.scala --- @@ -45,11 +50,75 @@ private[spark] object SamplingUtils { val

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread falaki
Github user falaki commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r14688338 --- Diff: core/src/main/scala/org/apache/spark/util/random/StratifiedSampler.scala --- @@ -0,0 +1,310 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread falaki
Github user falaki commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r14688363 --- Diff: core/src/main/scala/org/apache/spark/util/random/StratifiedSampler.scala --- @@ -0,0 +1,310 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread dorx
Github user dorx commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r14688550 --- Diff: core/src/main/scala/org/apache/spark/util/random/SamplingUtils.scala --- @@ -45,11 +50,75 @@ private[spark] object SamplingUtils { val

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread falaki
Github user falaki commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r14688585 --- Diff: core/src/test/scala/org/apache/spark/rdd/PairRDDFunctionsSuite.scala --- @@ -83,6 +83,120 @@ class PairRDDFunctionsSuite extends FunSuite with

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread falaki
Github user falaki commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r14688624 --- Diff: core/src/test/scala/org/apache/spark/rdd/PairRDDFunctionsSuite.scala --- @@ -83,6 +83,120 @@ class PairRDDFunctionsSuite extends FunSuite with

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread falaki
Github user falaki commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r14688613 --- Diff: core/src/test/scala/org/apache/spark/rdd/PairRDDFunctionsSuite.scala --- @@ -83,6 +83,120 @@ class PairRDDFunctionsSuite extends FunSuite with

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1025#issuecomment-48416874 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16431/ --- If your

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1025#issuecomment-48416873 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well.

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread dorx
Github user dorx commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r14688633 --- Diff: core/src/main/scala/org/apache/spark/util/random/StratifiedSampler.scala --- @@ -0,0 +1,310 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread falaki
Github user falaki commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r14688702 --- Diff: core/src/test/scala/org/apache/spark/rdd/PairRDDFunctionsSuite.scala --- @@ -83,6 +83,120 @@ class PairRDDFunctionsSuite extends FunSuite with

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1025#issuecomment-48418905 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1025#issuecomment-48418912 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread dorx
Github user dorx commented on the pull request: https://github.com/apache/spark/pull/1025#issuecomment-48419179 Holding out on updating the docs until the python version is supported. For the python version, any objections to using _jrdd to invoke the java version of sampleByKey?

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1025#issuecomment-48419506 Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16439/ --- If your project is set up for it, you can

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1025#issuecomment-48419505 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread dorx
Github user dorx commented on the pull request: https://github.com/apache/spark/pull/1025#issuecomment-48419578 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1025#issuecomment-48419772 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1025#issuecomment-48419784 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1025#issuecomment-48422316 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well.

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1025#issuecomment-48422320 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16441/ --- If your

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r14691480 --- Diff: pom.xml --- @@ -257,6 +257,11 @@ version1.5/version /dependency dependency +

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread dorx
Github user dorx commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r14691577 --- Diff: pom.xml --- @@ -257,6 +257,11 @@ version1.5/version /dependency dependency +

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r14694237 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -195,6 +193,37 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)]) }

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r14694233 --- Diff: core/src/main/scala/org/apache/spark/api/java/JavaPairRDD.scala --- @@ -130,6 +130,38 @@ class JavaPairRDD[K, V](val rdd: RDD[(K, V)]) new

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r14694234 --- Diff: core/src/main/scala/org/apache/spark/api/java/JavaPairRDD.scala --- @@ -130,6 +130,38 @@ class JavaPairRDD[K, V](val rdd: RDD[(K, V)]) new

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r14694253 --- Diff: core/src/main/scala/org/apache/spark/util/random/SamplingUtils.scala --- @@ -45,11 +50,78 @@ private[spark] object SamplingUtils { val

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r14694262 --- Diff: core/src/main/scala/org/apache/spark/util/random/StratifiedSampler.scala --- @@ -0,0 +1,311 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r14694258 --- Diff: core/src/main/scala/org/apache/spark/util/random/StratifiedSampler.scala --- @@ -0,0 +1,311 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r14694259 --- Diff: core/src/main/scala/org/apache/spark/util/random/StratifiedSampler.scala --- @@ -0,0 +1,311 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r14694250 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -195,6 +193,37 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)]) }

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r14694252 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -195,6 +193,37 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)]) }

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r14694251 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -195,6 +193,37 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)]) }

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r14694248 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -195,6 +193,37 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)]) }

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r14694254 --- Diff: core/src/main/scala/org/apache/spark/util/random/SamplingUtils.scala --- @@ -45,11 +50,78 @@ private[spark] object SamplingUtils { val

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r14694267 --- Diff: core/src/main/scala/org/apache/spark/util/random/StratifiedSampler.scala --- @@ -0,0 +1,311 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r14694277 --- Diff: core/src/main/scala/org/apache/spark/util/random/StratifiedSampler.scala --- @@ -0,0 +1,311 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r14694296 --- Diff: core/src/main/scala/org/apache/spark/util/random/StratifiedSampler.scala --- @@ -0,0 +1,311 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r14694292 --- Diff: core/src/main/scala/org/apache/spark/util/random/StratifiedSampler.scala --- @@ -0,0 +1,311 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r14694294 --- Diff: core/src/main/scala/org/apache/spark/util/random/StratifiedSampler.scala --- @@ -0,0 +1,311 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r14694274 --- Diff: core/src/main/scala/org/apache/spark/util/random/StratifiedSampler.scala --- @@ -0,0 +1,311 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r14694285 --- Diff: core/src/main/scala/org/apache/spark/util/random/StratifiedSampler.scala --- @@ -0,0 +1,311 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r14694278 --- Diff: core/src/main/scala/org/apache/spark/util/random/StratifiedSampler.scala --- @@ -0,0 +1,311 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r14694281 --- Diff: core/src/main/scala/org/apache/spark/util/random/StratifiedSampler.scala --- @@ -0,0 +1,311 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-2082] stratified sampling in PairRDDFun...

2014-07-08 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1025#discussion_r14694283 --- Diff: core/src/main/scala/org/apache/spark/util/random/StratifiedSampler.scala --- @@ -0,0 +1,311 @@ +/* + * Licensed to the Apache Software

  1   2   3   >