[GitHub] spark pull request: [SPARK-2656] Python version of stratified samp...

2014-07-24 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1554#issuecomment-50113405 Merged. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-2656] Python version of stratified samp...

2014-07-24 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1554 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enab

[GitHub] spark pull request: [SPARK-2656] Python version of stratified samp...

2014-07-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1554#issuecomment-50109623 QA results for PR 1554:- This patch PASSES unit tests.- This patch merges cleanly- This patch adds the following public classes (experimental):class RDDSamplerBase(object

[GitHub] spark pull request: [SPARK-2656] Python version of stratified samp...

2014-07-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1554#issuecomment-50107918 QA tests have started for PR 1554. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17161/consoleFull --- If

[GitHub] spark pull request: [SPARK-2656] Python version of stratified samp...

2014-07-24 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1554#issuecomment-50107704 LGTM. Waiting for Jenkins ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2656] Python version of stratified samp...

2014-07-24 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1554#issuecomment-50107711 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have th

[GitHub] spark pull request: [SPARK-2656] Python version of stratified samp...

2014-07-23 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1554#issuecomment-49965667 For unit tests, you can use bounds on the sample sizes. For example, there are two strata with sizes 100 and 1000, and we are going to sample with probabilities 0.5 and 0.

[GitHub] spark pull request: [SPARK-2656] Python version of stratified samp...

2014-07-23 Thread dorx
Github user dorx commented on the pull request: https://github.com/apache/spark/pull/1554#issuecomment-49958757 I added `#doctest: +SKIP` for the unit because the unit in `sample` also has it. I'm guessing this is because of the difference in behavior when numpy is used vs. not. It ra

[GitHub] spark pull request: [SPARK-2656] Python version of stratified samp...

2014-07-23 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1554#issuecomment-49958568 QA results for PR 1554:- This patch PASSES unit tests.- This patch merges cleanly- This patch adds the following public classes (experimental):class RDDSamplerBase(object

[GitHub] spark pull request: [SPARK-2656] Python version of stratified samp...

2014-07-23 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1554#issuecomment-49952181 QA tests have started for PR 1554. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17067/consoleFull --- If

[GitHub] spark pull request: [SPARK-2656] Python version of stratified samp...

2014-07-23 Thread dorx
Github user dorx commented on the pull request: https://github.com/apache/spark/pull/1554#issuecomment-49951984 @mengxr @falaki --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature en

[GitHub] spark pull request: [SPARK-2656] Python version of stratified samp...

2014-07-23 Thread dorx
GitHub user dorx opened a pull request: https://github.com/apache/spark/pull/1554 [SPARK-2656] Python version of stratified sampling exact sample size not supported for now. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dorx/spa