[GitHub] spark issue #20475: [SPARK-23256][ML][PYTHON] Add columnSchema method to PyS...

2018-02-01 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/20475 @HyukjinKwon looks like a great change to me, thank you for exposing the method in pyspark --- - To unsubscribe, e-mail:

[GitHub] spark issue #20455: [SPARK-23284][SQL] Document the behavior of several Colu...

2018-02-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20455 **[Test build #86918 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86918/testReport)** for PR 20455 at commit

[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests

2018-02-01 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20465 I agree that pandas and pyarrow should not be a hard requirement for users, and this is what it is today: PySpark only throws exception when users try to use pandas related functions without

[GitHub] spark pull request #20472: [SPARK-22751][ML]Improve ML RandomForest shuffle ...

2018-02-01 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/20472#discussion_r165340988 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -21,7 +21,6 @@ import java.io.IOException import

[GitHub] spark pull request #20472: [SPARK-22751][ML]Improve ML RandomForest shuffle ...

2018-02-01 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/20472#discussion_r165341639 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -917,11 +916,15 @@ private[spark] object RandomForest extends

[GitHub] spark pull request #20472: [SPARK-22751][ML]Improve ML RandomForest shuffle ...

2018-02-01 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/20472#discussion_r165341133 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -917,11 +916,15 @@ private[spark] object RandomForest extends

[GitHub] spark pull request #20472: [SPARK-22751][ML]Improve ML RandomForest shuffle ...

2018-02-01 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/20472#discussion_r165343004 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -917,11 +916,15 @@ private[spark] object RandomForest extends

[GitHub] spark pull request #20454: [SPARK-23202][SQL] Add new API in DataSourceWrite...

2018-02-01 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20454 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20455: [SPARK-23284][SQL] Document the behavior of several Colu...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20455 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20455: [SPARK-23284][SQL] Document the behavior of several Colu...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20455 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86926/ Test PASSed. ---

[GitHub] spark pull request #20460: [SPARK-23285][K8S] Allow fractional values for sp...

2018-02-01 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/20460#discussion_r165360148 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala --- @@ -267,7 +267,7 @@ private[deploy] class SparkSubmitArguments(args:

[GitHub] spark issue #20460: [SPARK-23285][K8S] Allow fractional values for spark.exe...

2018-02-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20460 **[Test build #4088 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4088/testReport)** for PR 20460 at commit

[GitHub] spark issue #20164: [SPARK-22971][ML] OneVsRestModel should use temporary Ra...

2018-02-01 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/20164 OK, the code in question uses the `rawPredictionCol` from the `models` it is given. Yes they'd have to be unique for this to make sense, because it adds those raw prediction columns to the output.

[GitHub] spark issue #17886: [SPARK-13983][SQL] Fix HiveThriftServer2 can not get "--...

2018-02-01 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17886 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark pull request #20476: [SPARK-23301][SQL] data source column pruning sho...

2018-02-01 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20476#discussion_r165374987 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/PushDownOperatorsToDataSource.scala --- @@ -81,35 +81,34 @@ object

[GitHub] spark issue #20476: [SPARK-23301][SQL] data source column pruning should wor...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20476 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/481/

[GitHub] spark issue #20476: [SPARK-23301][SQL] data source column pruning should wor...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20476 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #20472: [SPARK-22751][ML]Improve ML RandomForest shuffle ...

2018-02-01 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/20472#discussion_r165378186 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -931,7 +934,8 @@ private[spark] object RandomForest extends Logging {

[GitHub] spark pull request #20472: [SPARK-22751][ML]Improve ML RandomForest shuffle ...

2018-02-01 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/20472#discussion_r165378109 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -1002,10 +1008,14 @@ private[spark] object RandomForest extends

[GitHub] spark issue #20455: [SPARK-23284][SQL] Document the behavior of several Colu...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20455 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20455: [SPARK-23284][SQL] Document the behavior of several Colu...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20455 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/483/

[GitHub] spark issue #20469: [SPARK-23295][Build][Minor]Exclude Waring message when g...

2018-02-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20469 **[Test build #86920 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86920/testReport)** for PR 20469 at commit

[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests

2018-02-01 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20465 Also, if we should go in this way, I think we should enable some tests with PyPy too if I understood correctly and there isn't another problem I maybe missed:

[GitHub] spark issue #20468: [SPARK-23280][SQL][FOLLOWUP] Fix Java style check issues...

2018-02-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20468 **[Test build #86925 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86925/testReport)** for PR 20468 at commit

[GitHub] spark issue #19573: [SPARK-22350][SQL] select grouping__id from subquery

2018-02-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19573 **[Test build #86928 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86928/testReport)** for PR 19573 at commit

[GitHub] spark pull request #20430: [SPARK-23263][SQL] Create table stored as parquet...

2018-02-01 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/20430#discussion_r165349231 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/CommandUtils.scala --- @@ -34,16 +34,12 @@ object CommandUtils extends Logging {

[GitHub] spark pull request #20473: [SPARK-23300][TESTS] Prints out if Pandas and PyA...

2018-02-01 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/20473 [SPARK-23300][TESTS] Prints out if Pandas and PyArrow are installed or not in PySpark SQL tests ## What changes were proposed in this pull request? This PR proposes to log if PyArrow

[GitHub] spark issue #20473: [SPARK-23300][TESTS] Prints out if Pandas and PyArrow ar...

2018-02-01 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20473 @ueshin, @cloud-fan, @yhuai, @felixcheung and @BryanCutler, I tried to log it here. Could you take a look and see if it makes sense to you? ---

[GitHub] spark issue #20476: [SPARK-23301][SQL] data source column pruning should wor...

2018-02-01 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20476 @rdblue I know you wanna use `PhysicalOperation` to replace the current operator pushdown rule, but before we reach a consensus, I think we should still fix bugs in the existing code. ---

[GitHub] spark issue #20466: [SPARK-23293][SQL] fix data source v2 self join

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20466 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20466: [SPARK-23293][SQL] fix data source v2 self join

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20466 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/482/

[GitHub] spark pull request #19872: [SPARK-22274][PYTHON][SQL] User-defined aggregati...

2018-02-01 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/19872#discussion_r165385572 --- Diff: python/pyspark/sql/tests.py --- @@ -4353,6 +4347,446 @@ def test_unsupported_types(self):

[GitHub] spark pull request #20474: [SPARK-23235][Core] Add executor Threaddump to ap...

2018-02-01 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/20474#discussion_r165379694 --- Diff: core/src/main/scala/org/apache/spark/status/api/v1/OneApplicationResource.scala --- @@ -51,6 +52,21 @@ private[v1] class

[GitHub] spark pull request #20465: [SPARK-23292][TEST] always run python tests

2018-02-01 Thread shaneknapp
Github user shaneknapp commented on a diff in the pull request: https://github.com/apache/spark/pull/20465#discussion_r165412455 --- Diff: python/pyspark/sql/tests.py --- @@ -2819,13 +2802,6 @@ def test_to_pandas(self): self.assertEquals(types[4], 'datetime64[ns]')

[GitHub] spark pull request #17886: [SPARK-13983][SQL] Fix HiveThriftServer2 can not ...

2018-02-01 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17886#discussion_r165422360 --- Diff: sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/session/HiveSessionImpl.java --- @@ -221,6 +227,70 @@ private void

[GitHub] spark pull request #17886: [SPARK-13983][SQL] Fix HiveThriftServer2 can not ...

2018-02-01 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17886#discussion_r165425929 --- Diff: sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/session/HiveSessionImpl.java --- @@ -221,6 +227,70 @@ private void

[GitHub] spark issue #20476: [SPARK-23301][SQL] data source column pruning should wor...

2018-02-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20476 **[Test build #86933 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86933/testReport)** for PR 20476 at commit

[GitHub] spark issue #20466: [SPARK-23293][SQL] fix data source v2 self join

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20466 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20466: [SPARK-23293][SQL] fix data source v2 self join

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20466 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86934/ Test PASSed. ---

[GitHub] spark issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous ...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19788 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86939/ Test FAILed. ---

[GitHub] spark issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous ...

2018-02-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19788 **[Test build #86939 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86939/testReport)** for PR 19788 at commit

[GitHub] spark pull request #20473: [SPARK-23300][TESTS] Prints out if Pandas and PyA...

2018-02-01 Thread yhuai
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/20473#discussion_r165445947 --- Diff: python/run-tests.py --- @@ -151,6 +151,38 @@ def parse_opts(): return opts +def _check_dependencies(python_exec,

[GitHub] spark issue #20476: [SPARK-23301][SQL] data source column pruning should wor...

2018-02-01 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20476 @cloud-fan, @gatorsmile, this PR demonstrates why we should use PhysicalOperation. I ported the tests from this PR over to our branch and they pass without modifying the push-down code. That's

[GitHub] spark issue #20479: [SPARK-23305][SQL][TEST] Add `spark.sql.files.ignoreMiss...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20479 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #17886: [SPARK-13983][SQL] Fix HiveThriftServer2 can not ...

2018-02-01 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17886 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20455: [SPARK-23284][SQL] Document the behavior of several Colu...

2018-02-01 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20455 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #20476: [SPARK-23301][SQL] data source column pruning should wor...

2018-02-01 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20476 @gatorsmile, thanks for the context. If we need to redesign push-down, then I think we should do that separately and with a design plan. **I don't think it's a good idea to bundle it into an

[GitHub] spark issue #19565: [SPARK-22111][MLLIB] OnlineLDAOptimizer should filter ou...

2018-02-01 Thread akopich
Github user akopich commented on the issue: https://github.com/apache/spark/pull/19565 @hhbyyh So, I guess, I should just roll the refactoring back, right? --- - To unsubscribe, e-mail:

[GitHub] spark issue #20460: [SPARK-23285][K8S] Allow fractional values for spark.exe...

2018-02-01 Thread jiangxb1987
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/20460 Actually I think this may fail some check (though may not throw exceptions) for instance this one:

[GitHub] spark pull request #20473: [SPARK-23300][TESTS] Prints out if Pandas and PyA...

2018-02-01 Thread yhuai
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/20473#discussion_r165445232 --- Diff: python/run-tests.py --- @@ -151,6 +151,38 @@ def parse_opts(): return opts +def _check_dependencies(python_exec,

[GitHub] spark issue #20479: [SPARK-23305][SQL][TEST] Add `spark.sql.files.ignoreMiss...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20479 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/485/

[GitHub] spark pull request #17886: [SPARK-13983][SQL] Fix HiveThriftServer2 can not ...

2018-02-01 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17886#discussion_r165447010 --- Diff: sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/session/HiveSessionImpl.java --- @@ -221,6 +227,70 @@ private void

[GitHub] spark pull request #20466: [SPARK-23293][SQL] fix data source v2 self join

2018-02-01 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20466 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20455: [SPARK-23284][SQL] Document the behavior of several Colu...

2018-02-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20455 **[Test build #86941 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86941/testReport)** for PR 20455 at commit

[GitHub] spark issue #20462: [SPARK-23020][core] Fix another race in the in-process l...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20462 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/486/

[GitHub] spark issue #20462: [SPARK-23020][core] Fix another race in the in-process l...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20462 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20476: [SPARK-23301][SQL] data source column pruning should wor...

2018-02-01 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20476 https://github.com/apache/spark/pull/19424 is the original PR that introduced the new rule `PushDownOperatorsToDataSource`. Both of us reviewed it. : ) Thank you for your understanding!

[GitHub] spark issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous ...

2018-02-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19788 **[Test build #86939 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86939/testReport)** for PR 19788 at commit

[GitHub] spark issue #20422: [SPARK-23253][Core][Shuffle]Only write shuffle temporary...

2018-02-01 Thread jiangxb1987
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/20422 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark pull request #20479: [SPARK-23305][SQL][TEST] Add `spark.sql.files.ign...

2018-02-01 Thread dongjoon-hyun
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/20479 [SPARK-23305][SQL][TEST] Add `spark.sql.files.ignoreMissingFiles` test case for ORC ## What changes were proposed in this pull request? Like Parquet, Apache Spark ORC already

[GitHub] spark issue #20466: [SPARK-23293][SQL] fix data source v2 self join

2018-02-01 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20466 +1 Good to get this in before changes to the relation. --- - To unsubscribe, e-mail:

[GitHub] spark issue #20466: [SPARK-23293][SQL] fix data source v2 self join

2018-02-01 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20466 This is another bug fix of the new data source v2 Thanks! Merged to master/2.3 --- - To unsubscribe, e-mail:

[GitHub] spark issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous ...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19788 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20474: [SPARK-23235][Core] Add executor Threaddump to api

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20474 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20474: [SPARK-23235][Core] Add executor Threaddump to api

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20474 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86936/ Test FAILed. ---

[GitHub] spark pull request #19872: [SPARK-22274][PYTHON][SQL] User-defined aggregati...

2018-02-01 Thread yhuai
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/19872#discussion_r165449847 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala --- @@ -199,7 +200,7 @@ object ExtractFiltersAndInnerJoins

[GitHub] spark issue #20476: [SPARK-23301][SQL] data source column pruning should wor...

2018-02-01 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20476 @gatorsmile, Do you mean this? > Extensibility is not good and operator push-down capabilities are limited. If so, that's very open to interpretation. I would assume it means that

[GitHub] spark issue #20462: [SPARK-23020][core] Fix another race in the in-process l...

2018-02-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20462 **[Test build #86942 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86942/testReport)** for PR 20462 at commit

[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

2018-02-01 Thread rednaxelafx
Github user rednaxelafx commented on the issue: https://github.com/apache/spark/pull/20419 @kiszk For this specific kind of usage, I don't think using a hardcoded stable ID will be a problem. The comment we're talking about is the kind the can only appear once in a single

[GitHub] spark pull request #17886: [SPARK-13983][SQL] Fix HiveThriftServer2 can not ...

2018-02-01 Thread liufengdb
Github user liufengdb commented on a diff in the pull request: https://github.com/apache/spark/pull/17886#discussion_r165441473 --- Diff: sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/session/HiveSessionImpl.java --- @@ -221,6 +227,70 @@ private void

[GitHub] spark issue #20474: [SPARK-23235][Core] Add executor Threaddump to api

2018-02-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20474 **[Test build #86936 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86936/testReport)** for PR 20474 at commit

[GitHub] spark issue #20474: [SPARK-23235][Core] Add executor Threaddump to api

2018-02-01 Thread attilapiros
Github user attilapiros commented on the issue: https://github.com/apache/spark/pull/20474 Unrelated failure: "org.apache.spark.sql.execution.datasources.orc.OrcQuerySuite" --- - To unsubscribe, e-mail:

[GitHub] spark issue #20479: [SPARK-23305][SQL][TEST] Add `spark.sql.files.ignoreMiss...

2018-02-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20479 **[Test build #86940 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86940/testReport)** for PR 20479 at commit

[GitHub] spark issue #17886: [SPARK-13983][SQL] Fix HiveThriftServer2 can not get "--...

2018-02-01 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17886 LGTM This bug fix is nice to have in 2.3. I will merge it now. Please submit a follow-up PR. Thanks! Merged to master/2.3 ---

[GitHub] spark issue #20476: [SPARK-23301][SQL] data source column pruning should wor...

2018-02-01 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20476 @rdblue To be honest, the push-down solution in the current code base is not well designed. We got many feedbacks from the community (e.g., SAP and IBM Research). One proposed a bottom-up

[GitHub] spark issue #20460: [SPARK-23285][K8S] Allow fractional values for spark.exe...

2018-02-01 Thread jiangxb1987
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/20460 Do we also want to update the comment of `SPARK_EXECUTOR_CORES` in `spark-env.sh` ? --- - To unsubscribe, e-mail:

[GitHub] spark pull request #20476: [SPARK-23301][SQL] data source column pruning sho...

2018-02-01 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20476#discussion_r165437683 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/PushDownOperatorsToDataSource.scala --- @@ -81,35 +81,34 @@ object

[GitHub] spark issue #20476: [SPARK-23301][SQL] data source column pruning should wor...

2018-02-01 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20476 To everyone, this is a bug fix we should merge before the next RC of Spark 2.3. --- - To unsubscribe, e-mail:

[GitHub] spark issue #20476: [SPARK-23301][SQL] data source column pruning should wor...

2018-02-01 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20476 @rdblue Operator pushdown is part of the [data source API V2 SPIP](https://docs.google.com/document/d/1n_vUVbF4KD3gxTmkNEon5qdQ-Z8qU5Frf6WMQZ6jJVM/edit#):

[GitHub] spark pull request #20462: [SPARK-23020][core] Fix another race in the in-pr...

2018-02-01 Thread vanzin
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20462#discussion_r165462812 --- Diff: launcher/src/main/java/org/apache/spark/launcher/LauncherServer.java --- @@ -363,17 +362,28 @@ public void close() throws IOException {

[GitHub] spark pull request #20480: [Spark-23306] Fix the oom caused by contention

2018-02-01 Thread zhzhan
GitHub user zhzhan opened a pull request: https://github.com/apache/spark/pull/20480 [Spark-23306] Fix the oom caused by contention ## What changes were proposed in this pull request? here is race condition in TaskMemoryManger, which may cause OOM. The memory

[GitHub] spark pull request #19872: [SPARK-22274][PYTHON][SQL] User-defined aggregati...

2018-02-01 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/19872#discussion_r165387302 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala --- @@ -199,7 +200,7 @@ object ExtractFiltersAndInnerJoins

[GitHub] spark pull request #19788: [SPARK-9853][Core] Optimize shuffle fetch of cont...

2018-02-01 Thread yucai
Github user yucai commented on a diff in the pull request: https://github.com/apache/spark/pull/19788#discussion_r165408800 --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java --- @@ -203,22 +203,23 @@ private

[GitHub] spark pull request #20462: [SPARK-23020][core] Fix another race in the in-pr...

2018-02-01 Thread vanzin
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20462#discussion_r165432306 --- Diff: launcher/src/main/java/org/apache/spark/launcher/LauncherServer.java --- @@ -363,17 +362,28 @@ public void close() throws IOException {

[GitHub] spark pull request #20474: [SPARK-23235][Core] Add executor Threaddump to ap...

2018-02-01 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/20474#discussion_r165377861 --- Diff: core/src/main/scala/org/apache/spark/status/api/v1/OneApplicationResource.scala --- @@ -51,6 +52,21 @@ private[v1] class

[GitHub] spark issue #20474: [SPARK-23235][Core] Add executor Threaddump to api

2018-02-01 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/20474 Jenkins, ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #20474: [SPARK-23235][Core] Add executor Threaddump to api

2018-02-01 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/20474 Jenkins, add to whitelist --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20477 **[Test build #86937 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86937/testReport)** for PR 20477 at commit

[GitHub] spark issue #20478: [SPARK-8835][Streaming] Provide pluggable Congestion Str...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20478 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20455: [SPARK-23284][SQL] Document the behavior of several Colu...

2018-02-01 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20455 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-01 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20477 cc @rxin @gatorsmile @rdblue @jose-torres --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20477 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86937/ Test FAILed. ---

[GitHub] spark issue #18933: [WIP][SPARK-21722][SQL][PYTHON] Enable timezone-aware ti...

2018-02-01 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/18933 I thought this is resolved. @felixcheung can you give an example of the issue you ran into? --- - To unsubscribe, e-mail:

[GitHub] spark issue #19802: [SPARK-22594][CORE] Handling spark-submit and master ver...

2018-02-01 Thread miacobv
Github user miacobv commented on the issue: https://github.com/apache/spark/pull/19802 I get this when I start the master and worker, without running spark-submit on both 2.2.0 and 2.2.1 --- - To unsubscribe,

[GitHub] spark pull request #19788: [SPARK-9853][Core] Optimize shuffle fetch of cont...

2018-02-01 Thread yucai
Github user yucai commented on a diff in the pull request: https://github.com/apache/spark/pull/19788#discussion_r165409015 --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolver.java --- @@ -165,13 +165,23 @@ public

[GitHub] spark pull request #20474: [SPARK-23235][Core] Add executor Threaddump to ap...

2018-02-01 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20474#discussion_r165411432 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2168,7 +2168,17 @@ private[spark] object Utils extends Logging { //

[GitHub] spark issue #20419: [SPARK-23032][SQL][FOLLOW-UP]Add codegenStageId in comme...

2018-02-01 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/20419 @rednaxelafx I understand your concern when `ctx.registerComment()` is conditionally called. If we call `ctx.registerComment()` with the specific identified multiple times, how do we handle?

[GitHub] spark pull request #17886: [SPARK-13983][SQL] Fix HiveThriftServer2 can not ...

2018-02-01 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17886#discussion_r165421914 --- Diff: sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/session/HiveSessionImpl.java --- @@ -221,6 +227,70 @@ private void

[GitHub] spark issue #20455: [SPARK-23284][SQL] Document the behavior of several Colu...

2018-02-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20455 **[Test build #86935 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86935/testReport)** for PR 20455 at commit

[GitHub] spark issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous ...

2018-02-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19788 **[Test build #86938 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86938/testReport)** for PR 19788 at commit

<    1   2   3   4   5   6   7   >