[GitHub] spark pull request #22365: [SPARK-25381][SQL] Stratified sampling by Column ...

2018-09-20 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22365 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22365: [SPARK-25381][SQL] Stratified sampling by Column ...

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22365#discussion_r219034294 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala --- @@ -370,29 +370,76 @@ final class DataFrameStatFunctions

[GitHub] spark pull request #22365: [SPARK-25381][SQL] Stratified sampling by Column ...

2018-09-12 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22365#discussion_r217257137 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala --- @@ -370,29 +370,76 @@ final class DataFrameStatFunctions

[GitHub] spark pull request #22365: [SPARK-25381][SQL] Stratified sampling by Column ...

2018-09-12 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22365#discussion_r217256279 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala --- @@ -370,29 +370,76 @@ final class DataFrameStatFunctions

[GitHub] spark pull request #22365: [SPARK-25381][SQL] Stratified sampling by Column ...

2018-09-12 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22365#discussion_r217252035 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala --- @@ -370,29 +370,76 @@ final class DataFrameStatFunctions

[GitHub] spark pull request #22365: [SPARK-25381][SQL] Stratified sampling by Column ...

2018-09-10 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22365#discussion_r216482340 --- Diff: python/pyspark/sql/dataframe.py --- @@ -880,18 +880,23 @@ def sampleBy(self, col, fractions, seed=None): | 0|5| |

[GitHub] spark pull request #22365: [SPARK-25381][SQL] Stratified sampling by Column ...

2018-09-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22365#discussion_r216233575 --- Diff: python/pyspark/sql/dataframe.py --- @@ -880,18 +880,23 @@ def sampleBy(self, col, fractions, seed=None): | 0|5|

[GitHub] spark pull request #22365: [SPARK-25381][SQL] Stratified sampling by Column ...

2018-09-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22365#discussion_r216233066 --- Diff: python/pyspark/sql/dataframe.py --- @@ -880,18 +880,23 @@ def sampleBy(self, col, fractions, seed=None): | 0|5|

[GitHub] spark pull request #22365: [SPARK-25381][SQL] Stratified sampling by Column ...

2018-09-08 Thread MaxGekk
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/22365 [SPARK-25381][SQL] Stratified sampling by Column argument ## What changes were proposed in this pull request? In the PR, I propose to add an overloaded method for `sampleBy` which accepts