[GitHub] [spark] SparkQA commented on pull request #31794: [SPARK-34683][SQL][DOCS][3.0] Update the documents to explain the usage of LIST FILE and LIST JAR in case they take multiple file names

2021-03-09 Thread GitBox
SparkQA commented on pull request #31794: URL: https://github.com/apache/spark/pull/31794#issuecomment-795047891 **[Test build #135925 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135925/testReport)** for PR 31794 at commit [`3bc5bb8`](https://github.co

[GitHub] [spark] viirya commented on pull request #31771: [SPARK-34652][AVRO] Support SchemaRegistry in from_avro method

2021-03-09 Thread GitBox
viirya commented on pull request #31771: URL: https://github.com/apache/spark/pull/31771#issuecomment-795041758 Thanks for ping me, @dongjoon-hyun. I quickly go through previous comments and hope I don't miss anything. If anything I miss or misunderstand, please let me know. I

[GitHub] [spark] SparkQA commented on pull request #31766: [SPARK-34596][SPARK-34607][SQL][FOLLOWUP][TESTS] Add `PlanTest. testFallback` and use it

2021-03-09 Thread GitBox
SparkQA commented on pull request #31766: URL: https://github.com/apache/spark/pull/31766#issuecomment-795037300 **[Test build #135927 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135927/testReport)** for PR 31766 at commit [`d24be4c`](https://github.com

[GitHub] [spark] SparkQA commented on pull request #31653: [SPARK-33832][SQL] v2. move OptimzieSkewedJoin to query stage preparation

2021-03-09 Thread GitBox
SparkQA commented on pull request #31653: URL: https://github.com/apache/spark/pull/31653#issuecomment-795036830 **[Test build #135928 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135928/testReport)** for PR 31653 at commit [`7cfda59`](https://github.com

[GitHub] [spark] SparkQA commented on pull request #31794: [SPARK-34683][SQL][DOCS][3.0] Update the documents to explain the usage of LIST FILE and LIST JAR in case they take multiple file names

2021-03-09 Thread GitBox
SparkQA commented on pull request #31794: URL: https://github.com/apache/spark/pull/31794#issuecomment-795032162 **[Test build #135925 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135925/testReport)** for PR 31794 at commit [`3bc5bb8`](https://github.com

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31653: [SPARK-33832][SQL] v2. move OptimzieSkewedJoin to query stage preparation

2021-03-09 Thread GitBox
AmplabJenkins removed a comment on pull request #31653: URL: https://github.com/apache/spark/pull/31653#issuecomment-795025418 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40512/

[GitHub] [spark] AmplabJenkins commented on pull request #31653: [SPARK-33832][SQL] v2. move OptimzieSkewedJoin to query stage preparation

2021-03-09 Thread GitBox
AmplabJenkins commented on pull request #31653: URL: https://github.com/apache/spark/pull/31653#issuecomment-795025418 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40512/ -

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31789: [SPARK-34677][SQL] Support the `+`/`-` operators over ANSI SQL intervals

2021-03-09 Thread GitBox
AmplabJenkins removed a comment on pull request #31789: URL: https://github.com/apache/spark/pull/31789#issuecomment-795023365 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135926/ -

[GitHub] [spark] AmplabJenkins commented on pull request #31789: [SPARK-34677][SQL] Support the `+`/`-` operators over ANSI SQL intervals

2021-03-09 Thread GitBox
AmplabJenkins commented on pull request #31789: URL: https://github.com/apache/spark/pull/31789#issuecomment-795023365 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135926/ -

[GitHub] [spark] dongjoon-hyun edited a comment on pull request #31766: [SPARK-34596][SPARK-34607][SQL][FOLLOWUP][TESTS] Add `PlanTest. testFallback` and use it

2021-03-09 Thread GitBox
dongjoon-hyun edited a comment on pull request #31766: URL: https://github.com/apache/spark/pull/31766#issuecomment-795018697 Hi, @rednaxelafx , @cloud-fan , @viirya , @kiszk , @maropu , @HyukjinKwon . Here is a summary of the progress. 1. https://github.com/apache/spark/pull/31775

[GitHub] [spark] dongjoon-hyun edited a comment on pull request #31766: [SPARK-34596][SPARK-34607][SQL][FOLLOWUP][TESTS] Add `PlanTest. testFallback` and use it

2021-03-09 Thread GitBox
dongjoon-hyun edited a comment on pull request #31766: URL: https://github.com/apache/spark/pull/31766#issuecomment-795018697 Hi, @rednaxelafx , @cloud-fan , @viirya , @kiszk , @maropu , @HyukjinKwon . Here is a summary of the progress. 1. https://github.com/apache/spark/pull/31775

[GitHub] [spark] dongjoon-hyun edited a comment on pull request #31766: [SPARK-34596][SPARK-34607][SQL][FOLLOWUP][TESTS] Add `PlanTest. testFallback` and use it

2021-03-09 Thread GitBox
dongjoon-hyun edited a comment on pull request #31766: URL: https://github.com/apache/spark/pull/31766#issuecomment-795018697 Hi, @rednaxelafx , @cloud-fan , @viirya , @kiszk , @maropu , @HyukjinKwon . Here is a summary of the progress. 1. https://github.com/apache/spark/pull/31775

[GitHub] [spark] dongjoon-hyun commented on pull request #31766: [SPARK-34596][SPARK-34607][SQL][FOLLOWUP][TESTS] Add `PlanTest. testFallback` and use it

2021-03-09 Thread GitBox
dongjoon-hyun commented on pull request #31766: URL: https://github.com/apache/spark/pull/31766#issuecomment-795018697 Hi, @rednaxelafx , @cloud-fan , @viirya , @kiszk , @maropu . Here is a summary of the progress. 1. https://github.com/apache/spark/pull/31775 (SPARK-34660 `Don't u

[GitHub] [spark] mengxr commented on pull request #31693: [SPARK-34448][ML] Binary logistic regression incorrectly computes the intercept and coefficients with small var features

2021-03-09 Thread GitBox
mengxr commented on pull request #31693: URL: https://github.com/apache/spark/pull/31693#issuecomment-795017955 This is my understanding of the behavior. Because we didn't center the columns, when there is a near-constant column, after std scaling the values become very large. As a result,

[GitHub] [spark] sarutak opened a new pull request #31795: [SPARK-34683][SQL][DOCS][3.1] Update the documents to explain the usage of LIST FILE and LIST JAR in case they take multiple file names

2021-03-09 Thread GitBox
sarutak opened a new pull request #31795: URL: https://github.com/apache/spark/pull/31795 ### What changes were proposed in this pull request? This PR partially backports the change of #31721 (SPARK-34603). This PR improves documents to explain `LIST FILE` and `LIST JAR` commands

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31757: [SPARK-33602][SQL] Group exception messages in execution/datasources

2021-03-09 Thread GitBox
AmplabJenkins removed a comment on pull request #31757: URL: https://github.com/apache/spark/pull/31757#issuecomment-795007230 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40502/

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31735: [SPARK-34600][Pyspark][SQL] Return User-defined types from Pandas UDF

2021-03-09 Thread GitBox
AmplabJenkins removed a comment on pull request #31735: URL: https://github.com/apache/spark/pull/31735#issuecomment-795007226 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40503/

[GitHub] [spark] JkSelf commented on a change in pull request #31756: [SPARK-34637] [SQL] [WIP] Support DPP when the broadcast exchange can be reused

2021-03-09 Thread GitBox
JkSelf commented on a change in pull request #31756: URL: https://github.com/apache/spark/pull/31756#discussion_r591142146 ## File path: sql/core/src/test/scala/org/apache/spark/sql/DynamicPartitionPruningSuite.scala ## @@ -1409,4 +1409,17 @@ class DynamicPartitionPruningSuite

[GitHub] [spark] AmplabJenkins commented on pull request #31757: [SPARK-33602][SQL] Group exception messages in execution/datasources

2021-03-09 Thread GitBox
AmplabJenkins commented on pull request #31757: URL: https://github.com/apache/spark/pull/31757#issuecomment-795007230 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40502/ -

[GitHub] [spark] AmplabJenkins commented on pull request #31735: [SPARK-34600][Pyspark][SQL] Return User-defined types from Pandas UDF

2021-03-09 Thread GitBox
AmplabJenkins commented on pull request #31735: URL: https://github.com/apache/spark/pull/31735#issuecomment-795007226 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40503/ -

[GitHub] [spark] JkSelf commented on pull request #31756: [SPARK-34637] [SQL] [WIP] Support DPP when the broadcast exchange can be reused

2021-03-09 Thread GitBox
JkSelf commented on pull request #31756: URL: https://github.com/apache/spark/pull/31756#issuecomment-795007122 @cloud-fan This approach mainly contain three steps. 1. Find the reused exchange. If exist, it will apply the DPP filter. 2. In order to reuse the exchange stored in

[GitHub] [spark] sarutak opened a new pull request #31794: [SPARK-34683][SQL][DOCS] Update the documents to explain the usage of LIST FILE and LIST JAR in case they take multiple file names

2021-03-09 Thread GitBox
sarutak opened a new pull request #31794: URL: https://github.com/apache/spark/pull/31794 ### What changes were proposed in this pull request? This PR partially backports the change of #32721 (SPARK-34603). This PR improves documents to explain `LIST FILE` and `LIST JAR` commands

[GitHub] [spark] dongjoon-hyun closed pull request #31764: [SPARK-34596][SPARK-34607][SQL][FOLLOWUP][TESTS][3.1] Add `PlanTest. testFallback` and use it

2021-03-09 Thread GitBox
dongjoon-hyun closed pull request #31764: URL: https://github.com/apache/spark/pull/31764 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] SparkQA commented on pull request #31735: [SPARK-34600][Pyspark][SQL] Return User-defined types from Pandas UDF

2021-03-09 Thread GitBox
SparkQA commented on pull request #31735: URL: https://github.com/apache/spark/pull/31735#issuecomment-794991858 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40503/ ---

[GitHub] [spark] c21 commented on pull request #31792: [SPARK-34681][SQL] Fix bug for full outer shuffled hash join when building left side with non-equal condition

2021-03-09 Thread GitBox
c21 commented on pull request #31792: URL: https://github.com/apache/spark/pull/31792#issuecomment-794987509 Thank you @cloud-fan and @dongjoon-hyun for review! This is an automated message from the Apache Git Service. To res

[GitHub] [spark] dongjoon-hyun commented on pull request #31792: [SPARK-34681][SQL] Fix bug for full outer shuffled hash join when building left side with non-equal condition

2021-03-09 Thread GitBox
dongjoon-hyun commented on pull request #31792: URL: https://github.com/apache/spark/pull/31792#issuecomment-794985954 Merged to master/3.1. Thank you, @c21 and @cloud-fan . This is an automated message from the Apache Git Se

[GitHub] [spark] dongjoon-hyun closed pull request #31792: [SPARK-34681][SQL] Fix bug for full outer shuffled hash join when building left side with non-equal condition

2021-03-09 Thread GitBox
dongjoon-hyun closed pull request #31792: URL: https://github.com/apache/spark/pull/31792 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] cloud-fan commented on a change in pull request #31789: [SPARK-34677][SQL] Support the `+`/`-` operators over ANSI SQL intervals

2021-03-09 Thread GitBox
cloud-fan commented on a change in pull request #31789: URL: https://github.com/apache/spark/pull/31789#discussion_r591124725 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala ## @@ -173,6 +180,11 @@ abstract class BinaryArith

[GitHub] [spark] SparkQA commented on pull request #31757: [SPARK-33602][SQL] Group exception messages in execution/datasources

2021-03-09 Thread GitBox
SparkQA commented on pull request #31757: URL: https://github.com/apache/spark/pull/31757#issuecomment-794980163 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40502/ ---

[GitHub] [spark] MaxGekk commented on a change in pull request #31789: [SPARK-34677][SQL] Support the `+`/`-` operators over ANSI SQL intervals

2021-03-09 Thread GitBox
MaxGekk commented on a change in pull request #31789: URL: https://github.com/apache/spark/pull/31789#discussion_r591118022 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala ## @@ -173,6 +180,11 @@ abstract class BinaryArithme

[GitHub] [spark] MaxGekk commented on a change in pull request #31789: [SPARK-34677][SQL] Support the `+`/`-` operators over ANSI SQL intervals

2021-03-09 Thread GitBox
MaxGekk commented on a change in pull request #31789: URL: https://github.com/apache/spark/pull/31789#discussion_r591118022 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala ## @@ -173,6 +180,11 @@ abstract class BinaryArithme

[GitHub] [spark] HeartSaVioR commented on pull request #31355: [SPARK-34255][SQL] Support partitioning with static number on required distribution and ordering on V2 write

2021-03-09 Thread GitBox
HeartSaVioR commented on pull request #31355: URL: https://github.com/apache/spark/pull/31355#issuecomment-794970669 I'm OK to exclude it here and let someone come up with actual use case. As that was requested by @rdblue let's hear the voice before making change.

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31791: [SPARK-34678][SQL] Add table function registry

2021-03-09 Thread GitBox
AmplabJenkins removed a comment on pull request #31791: URL: https://github.com/apache/spark/pull/31791#issuecomment-794966406 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40507/

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31653: [SPARK-33832][SQL] v2. move OptimzieSkewedJoin to query stage preparation

2021-03-09 Thread GitBox
AmplabJenkins removed a comment on pull request #31653: URL: https://github.com/apache/spark/pull/31653#issuecomment-786240614 Can one of the admins verify this patch? This is an automated message from the Apache Git Service.

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31793: [SPARK-34682] [SQL] Fix regression in canonicalization error check in CustomShuffleReaderExec

2021-03-09 Thread GitBox
AmplabJenkins removed a comment on pull request #31793: URL: https://github.com/apache/spark/pull/31793#issuecomment-794966309 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40506/

[GitHub] [spark] AmplabJenkins commented on pull request #31793: [SPARK-34682] [SQL] Fix regression in canonicalization error check in CustomShuffleReaderExec

2021-03-09 Thread GitBox
AmplabJenkins commented on pull request #31793: URL: https://github.com/apache/spark/pull/31793#issuecomment-794966309 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40506/ -

[GitHub] [spark] AmplabJenkins commented on pull request #31791: [SPARK-34678][SQL] Add table function registry

2021-03-09 Thread GitBox
AmplabJenkins commented on pull request #31791: URL: https://github.com/apache/spark/pull/31791#issuecomment-794966406 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40507/ -

[GitHub] [spark] cloud-fan commented on pull request #31653: [SPARK-33832][SQL] v2. move OptimzieSkewedJoin to query stage preparation

2021-03-09 Thread GitBox
cloud-fan commented on pull request #31653: URL: https://github.com/apache/spark/pull/31653#issuecomment-794966039 ok to test This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] AngersZhuuuu edited a comment on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

2021-03-09 Thread GitBox
AngersZh edited a comment on pull request #31522: URL: https://github.com/apache/spark/pull/31522#issuecomment-794964106 > We can periodically log something in the built-in file commit protocol, but there is nothing we can do if people are using a custom file commit protocol. A

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31793: [SPARK-34682] [SQL] Fix regression in canonicalization error check in CustomShuffleReaderExec

2021-03-09 Thread GitBox
AmplabJenkins removed a comment on pull request #31793: URL: https://github.com/apache/spark/pull/31793#issuecomment-794964567 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135923/ -

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31791: [SPARK-34678][SQL] Add table function registry

2021-03-09 Thread GitBox
AmplabJenkins removed a comment on pull request #31791: URL: https://github.com/apache/spark/pull/31791#issuecomment-794964644 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135924/ -

[GitHub] [spark] AmplabJenkins commented on pull request #31793: [SPARK-34682] [SQL] Fix regression in canonicalization error check in CustomShuffleReaderExec

2021-03-09 Thread GitBox
AmplabJenkins commented on pull request #31793: URL: https://github.com/apache/spark/pull/31793#issuecomment-794964567 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135923/ -

[GitHub] [spark] AmplabJenkins commented on pull request #31791: [SPARK-34678][SQL] Add table function registry

2021-03-09 Thread GitBox
AmplabJenkins commented on pull request #31791: URL: https://github.com/apache/spark/pull/31791#issuecomment-794964644 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135924/ -

[GitHub] [spark] AngersZhuuuu commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

2021-03-09 Thread GitBox
AngersZh commented on pull request #31522: URL: https://github.com/apache/spark/pull/31522#issuecomment-794964106 > We can periodically log something in the built-in file commit protocol, but there is nothing we can do if people are using a custom file commit protocol. A new thr

[GitHub] [spark] cloud-fan commented on pull request #31756: [SPARK-34637] [SQL] [WIP] Support DPP when the broadcast exchange can be reused

2021-03-09 Thread GitBox
cloud-fan commented on pull request #31756: URL: https://github.com/apache/spark/pull/31756#issuecomment-794963801 Can you briefly introduce your approach? This is an automated message from the Apache Git Service. To respond

[GitHub] [spark] SparkQA commented on pull request #31735: [SPARK-34600][Pyspark][SQL] Return User-defined types from Pandas UDF

2021-03-09 Thread GitBox
SparkQA commented on pull request #31735: URL: https://github.com/apache/spark/pull/31735#issuecomment-794957871 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40503/ -

[GitHub] [spark] cloud-fan commented on a change in pull request #31789: [SPARK-34677][SQL] Support the `+`/`-` operators over ANSI SQL intervals

2021-03-09 Thread GitBox
cloud-fan commented on a change in pull request #31789: URL: https://github.com/apache/spark/pull/31789#discussion_r591100803 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala ## @@ -173,6 +180,11 @@ abstract class BinaryArith

[GitHub] [spark] SparkQA commented on pull request #31757: [SPARK-33602][SQL] Group exception messages in execution/datasources

2021-03-09 Thread GitBox
SparkQA commented on pull request #31757: URL: https://github.com/apache/spark/pull/31757#issuecomment-79495 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40502/ -

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31646: [SPARK-34538][SQL] Hive Metastore support filter by not-in

2021-03-09 Thread GitBox
AmplabJenkins removed a comment on pull request #31646: URL: https://github.com/apache/spark/pull/31646#issuecomment-794949018 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40505/

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31757: [SPARK-33602][SQL] Group exception messages in execution/datasources

2021-03-09 Thread GitBox
AmplabJenkins removed a comment on pull request #31757: URL: https://github.com/apache/spark/pull/31757#issuecomment-794949016 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40499/

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31793: [SPARK-34682] [SQL] Fix regression in canonicalization error check in CustomShuffleReaderExec

2021-03-09 Thread GitBox
AmplabJenkins removed a comment on pull request #31793: URL: https://github.com/apache/spark/pull/31793#issuecomment-794748695 Can one of the admins verify this patch? This is an automated message from the Apache Git Service.

[GitHub] [spark] AmplabJenkins commented on pull request #31646: [SPARK-34538][SQL] Hive Metastore support filter by not-in

2021-03-09 Thread GitBox
AmplabJenkins commented on pull request #31646: URL: https://github.com/apache/spark/pull/31646#issuecomment-794949018 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40505/ -

[GitHub] [spark] AmplabJenkins commented on pull request #31757: [SPARK-33602][SQL] Group exception messages in execution/datasources

2021-03-09 Thread GitBox
AmplabJenkins commented on pull request #31757: URL: https://github.com/apache/spark/pull/31757#issuecomment-794949016 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40499/ -

[GitHub] [spark] dongjoon-hyun commented on pull request #31771: [SPARK-34652][AVRO] Support SchemaRegistry in from_avro method

2021-03-09 Thread GitBox
dongjoon-hyun commented on pull request #31771: URL: https://github.com/apache/spark/pull/31771#issuecomment-794948253 cc @viirya, too, since this is SS area. This is an automated message from the Apache Git Service. To respo

[GitHub] [spark] MaxGekk commented on pull request #31789: [SPARK-34677][SQL] Support the `+`/`-` operators over ANSI SQL intervals

2021-03-09 Thread GitBox
MaxGekk commented on pull request #31789: URL: https://github.com/apache/spark/pull/31789#issuecomment-794947184 @HyukjinKwon @cloud-fan Could you take a look at this PR, please. This is an automated message from the Apache G

[GitHub] [spark] cloud-fan commented on pull request #31793: [SPARK-34682] [SQL] Fix regression in canonicalization error check in CustomShuffleReaderExec

2021-03-09 Thread GitBox
cloud-fan commented on pull request #31793: URL: https://github.com/apache/spark/pull/31793#issuecomment-794945724 ok to test This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] dongjoon-hyun commented on pull request #31793: [SPARK-34682] [SQL] Fix regression in canonicalization error check in CustomShuffleReaderExec

2021-03-09 Thread GitBox
dongjoon-hyun commented on pull request #31793: URL: https://github.com/apache/spark/pull/31793#issuecomment-794943536 In addition, in order to improve the visibility of this issue, I collect this JIRA (SPARK-34682) as a subtask of SPARK-33828 (SQL Adaptive Query Execution QA). -

[GitHub] [spark] dongjoon-hyun commented on pull request #31793: [SPARK-34682] [SQL] Fix regression in canonicalization error check in CustomShuffleReaderExec

2021-03-09 Thread GitBox
dongjoon-hyun commented on pull request #31793: URL: https://github.com/apache/spark/pull/31793#issuecomment-794941577 BTW, the patch itself looks good to me, @andygrove . Thanks! This is an automated message from the Apache

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #31793: [SPARK-34682] [SQL] Fix regression in canonicalization error check in CustomShuffleReaderExec

2021-03-09 Thread GitBox
dongjoon-hyun commented on a change in pull request #31793: URL: https://github.com/apache/spark/pull/31793#discussion_r591087917 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala ## @@ -18,6 +18,7 @@ package org.apache

[GitHub] [spark] SparkQA commented on pull request #31646: [SPARK-34538][SQL] Hive Metastore support filter by not-in

2021-03-09 Thread GitBox
SparkQA commented on pull request #31646: URL: https://github.com/apache/spark/pull/31646#issuecomment-794937995 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40505/ ---

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #31793: [SPARK-34682] [SQL] Fix regression in canonicalization error check in CustomShuffleReaderExec

2021-03-09 Thread GitBox
dongjoon-hyun commented on a change in pull request #31793: URL: https://github.com/apache/spark/pull/31793#discussion_r591085071 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala ## @@ -869,6 +870,26 @@ class AdaptiveQu

[GitHub] [spark] andygrove commented on a change in pull request #31793: [SPARK-34682] [SQL] Fix regression in canonicalization error check in CustomShuffleReaderExec

2021-03-09 Thread GitBox
andygrove commented on a change in pull request #31793: URL: https://github.com/apache/spark/pull/31793#discussion_r591084955 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala ## @@ -18,6 +18,7 @@ package org.apache.spa

[GitHub] [spark] SparkQA commented on pull request #31646: [SPARK-34538][SQL] Hive Metastore support filter by not-in

2021-03-09 Thread GitBox
SparkQA commented on pull request #31646: URL: https://github.com/apache/spark/pull/31646#issuecomment-794934226 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40505/ -

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #31793: [SPARK-34682] [SQL] Fix regression in canonicalization error check in CustomShuffleReaderExec

2021-03-09 Thread GitBox
dongjoon-hyun commented on a change in pull request #31793: URL: https://github.com/apache/spark/pull/31793#discussion_r591082351 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala ## @@ -869,6 +870,26 @@ class AdaptiveQu

[GitHub] [spark] SparkQA commented on pull request #31757: [SPARK-33602][SQL] Group exception messages in execution/datasources

2021-03-09 Thread GitBox
SparkQA commented on pull request #31757: URL: https://github.com/apache/spark/pull/31757#issuecomment-794933085 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40499/ ---

[GitHub] [spark] cloud-fan commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

2021-03-09 Thread GitBox
cloud-fan commented on pull request #31522: URL: https://github.com/apache/spark/pull/31522#issuecomment-794929425 We can periodically log something in the built-in file commit protocol, but there is nothing we can do if people are using a custom file commit protocol. I checked other

[GitHub] [spark] cloud-fan commented on a change in pull request #31652: [SPARK-34546][SQL] AlterViewAs.query should be analyzed during the analysis phase, and AlterViewAs should invalidate the cache

2021-03-09 Thread GitBox
cloud-fan commented on a change in pull request #31652: URL: https://github.com/apache/spark/pull/31652#discussion_r591075732 ## File path: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala ## @@ -1445,4 +1445,68 @@ class CachedTableSuite extends QueryTest wit

[GitHub] [spark] SparkQA commented on pull request #31792: [SPARK-34681][SQL] Fix bug for full outer shuffled hash join when building left side with non-equal condition

2021-03-09 Thread GitBox
SparkQA commented on pull request #31792: URL: https://github.com/apache/spark/pull/31792#issuecomment-794923063 **[Test build #135918 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135918/testReport)** for PR 31792 at commit [`dbddbd7`](https://github.com

[GitHub] [spark] cloud-fan commented on a change in pull request #31652: [SPARK-34546][SQL] AlterViewAs.query should be analyzed during the analysis phase, and AlterViewAs should invalidate the cache

2021-03-09 Thread GitBox
cloud-fan commented on a change in pull request #31652: URL: https://github.com/apache/spark/pull/31652#discussion_r591074003 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala ## @@ -596,19 +546,102 @@ object ViewHelper { (collectTem

[GitHub] [spark] cloud-fan commented on a change in pull request #31652: [SPARK-34546][SQL] AlterViewAs.query should be analyzed during the analysis phase, and AlterViewAs should invalidate the cache

2021-03-09 Thread GitBox
cloud-fan commented on a change in pull request #31652: URL: https://github.com/apache/spark/pull/31652#discussion_r591072759 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala ## @@ -596,19 +546,102 @@ object ViewHelper { (collectTem

[GitHub] [spark] SparkQA commented on pull request #31652: [SPARK-34546][SQL] AlterViewAs.query should be analyzed during the analysis phase, and AlterViewAs should invalidate the cache

2021-03-09 Thread GitBox
SparkQA commented on pull request #31652: URL: https://github.com/apache/spark/pull/31652#issuecomment-794919044 **[Test build #135921 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135921/testReport)** for PR 31652 at commit [`9302be9`](https://github.com

[GitHub] [spark] hiboyang commented on pull request #30763: [SPARK-31801][API][SHUFFLE] Register map output metadata

2021-03-09 Thread GitBox
hiboyang commented on pull request #30763: URL: https://github.com/apache/spark/pull/30763#issuecomment-794917127 Just see the discussion here. The location abstraction is a good idea. For different shuffle solutions, they could have different location implementation, e.g. Spark's default

[GitHub] [spark] ulysses-you commented on a change in pull request #31646: [SPARK-34538][SQL] Hive Metastore support filter by not-in

2021-03-09 Thread GitBox
ulysses-you commented on a change in pull request #31646: URL: https://github.com/apache/spark/pull/31646#discussion_r591043266 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/FiltersSuite.scala ## @@ -108,6 +108,52 @@ class FiltersSuite extends SparkFun

[GitHub] [spark] cloud-fan commented on a change in pull request #31652: [SPARK-34546][SQL] AlterViewAs.query should be analyzed during the analysis phase, and AlterViewAs should invalidate the cache

2021-03-09 Thread GitBox
cloud-fan commented on a change in pull request #31652: URL: https://github.com/apache/spark/pull/31652#discussion_r591067365 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala ## @@ -596,19 +546,102 @@ object ViewHelper { (collectTem

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31757: [SPARK-33602][SQL] Group exception messages in execution/datasources

2021-03-09 Thread GitBox
AmplabJenkins removed a comment on pull request #31757: URL: https://github.com/apache/spark/pull/31757#issuecomment-794911909 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135919/ -

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31735: [SPARK-34600][Pyspark][SQL] Return User-defined types from Pandas UDF

2021-03-09 Thread GitBox
AmplabJenkins removed a comment on pull request #31735: URL: https://github.com/apache/spark/pull/31735#issuecomment-794911956 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135920/ -

[GitHub] [spark] AmplabJenkins commented on pull request #31735: [SPARK-34600][Pyspark][SQL] Return User-defined types from Pandas UDF

2021-03-09 Thread GitBox
AmplabJenkins commented on pull request #31735: URL: https://github.com/apache/spark/pull/31735#issuecomment-794911956 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135920/ -

[GitHub] [spark] AmplabJenkins commented on pull request #31757: [SPARK-33602][SQL] Group exception messages in execution/datasources

2021-03-09 Thread GitBox
AmplabJenkins commented on pull request #31757: URL: https://github.com/apache/spark/pull/31757#issuecomment-794911909 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135919/ -

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31652: [SPARK-34546][SQL] AlterViewAs.query should be analyzed during the analysis phase, and AlterViewAs should invalidate the cache

2021-03-09 Thread GitBox
AmplabJenkins removed a comment on pull request #31652: URL: https://github.com/apache/spark/pull/31652#issuecomment-794910486 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40504/

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31792: [SPARK-34681][SQL] Fix bug for full outer shuffled hash join when building left side with non-equal condition

2021-03-09 Thread GitBox
AmplabJenkins removed a comment on pull request #31792: URL: https://github.com/apache/spark/pull/31792#issuecomment-794910282 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40501/

[GitHub] [spark] AmplabJenkins commented on pull request #31652: [SPARK-34546][SQL] AlterViewAs.query should be analyzed during the analysis phase, and AlterViewAs should invalidate the cache

2021-03-09 Thread GitBox
AmplabJenkins commented on pull request #31652: URL: https://github.com/apache/spark/pull/31652#issuecomment-794910486 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40504/ -

[GitHub] [spark] AmplabJenkins commented on pull request #31792: [SPARK-34681][SQL] Fix bug for full outer shuffled hash join when building left side with non-equal condition

2021-03-09 Thread GitBox
AmplabJenkins commented on pull request #31792: URL: https://github.com/apache/spark/pull/31792#issuecomment-794910282 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40501/ -

[GitHub] [spark] hiboyang commented on a change in pull request #31763: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

2021-03-09 Thread GitBox
hiboyang commented on a change in pull request #31763: URL: https://github.com/apache/spark/pull/31763#discussion_r591061966 ## File path: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala ## @@ -52,6 +52,13 @@ private[spark] sealed trait MapStatus { * partitio

[GitHub] [spark] viirya commented on a change in pull request #31761: [SPARK-34295][CORE] Exclude filesystems from token renewal

2021-03-09 Thread GitBox
viirya commented on a change in pull request #31761: URL: https://github.com/apache/spark/pull/31761#discussion_r591059181 ## File path: core/src/main/scala/org/apache/spark/internal/config/package.scala ## @@ -691,6 +691,15 @@ package object config { .toSequence .cre

[GitHub] [spark] SparkQA commented on pull request #31757: [SPARK-33602][SQL] Group exception messages in execution/datasources

2021-03-09 Thread GitBox
SparkQA commented on pull request #31757: URL: https://github.com/apache/spark/pull/31757#issuecomment-794904507 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40499/ -

[GitHub] [spark] Ngone51 commented on a change in pull request #31763: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

2021-03-09 Thread GitBox
Ngone51 commented on a change in pull request #31763: URL: https://github.com/apache/spark/pull/31763#discussion_r591058194 ## File path: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala ## @@ -52,6 +52,13 @@ private[spark] sealed trait MapStatus { * partition

[GitHub] [spark] SparkQA commented on pull request #31646: [SPARK-34538][SQL] Hive Metastore support filter by not-in

2021-03-09 Thread GitBox
SparkQA commented on pull request #31646: URL: https://github.com/apache/spark/pull/31646#issuecomment-794899677 **[Test build #135922 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135922/testReport)** for PR 31646 at commit [`e8c7b6c`](https://github.com

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31646: [SPARK-34538][SQL] Hive Metastore support filter by not-in

2021-03-09 Thread GitBox
AmplabJenkins removed a comment on pull request #31646: URL: https://github.com/apache/spark/pull/31646#issuecomment-794896896 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40498/

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31652: [SPARK-34546][SQL] AlterViewAs.query should be analyzed during the analysis phase, and AlterViewAs should invalidate the cache

2021-03-09 Thread GitBox
AmplabJenkins removed a comment on pull request #31652: URL: https://github.com/apache/spark/pull/31652#issuecomment-794896895 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] AmplabJenkins commented on pull request #31652: [SPARK-34546][SQL] AlterViewAs.query should be analyzed during the analysis phase, and AlterViewAs should invalidate the cache

2021-03-09 Thread GitBox
AmplabJenkins commented on pull request #31652: URL: https://github.com/apache/spark/pull/31652#issuecomment-794896899 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[GitHub] [spark] AmplabJenkins commented on pull request #31646: [SPARK-34538][SQL] Hive Metastore support filter by not-in

2021-03-09 Thread GitBox
AmplabJenkins commented on pull request #31646: URL: https://github.com/apache/spark/pull/31646#issuecomment-794896896 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40498/ -

[GitHub] [spark] cloud-fan commented on pull request #31355: [SPARK-34255][SQL] Support partitioning with static number on required distribution and ordering on V2 write

2021-03-09 Thread GitBox
cloud-fan commented on pull request #31355: URL: https://github.com/apache/spark/pull/31355#issuecomment-794887512 > where external storage has limited bandwidth on concurrent writes for such a use case, shall we use max num partitions? One option is to not allow it first, and revisi

[GitHub] [spark] hiboyang commented on a change in pull request #31763: [SPARK-33114][CORE] Add metadata in MapStatus to support custom shuffle manager

2021-03-09 Thread GitBox
hiboyang commented on a change in pull request #31763: URL: https://github.com/apache/spark/pull/31763#discussion_r591044762 ## File path: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala ## @@ -52,6 +52,13 @@ private[spark] sealed trait MapStatus { * partitio

[GitHub] [spark] ulysses-you commented on a change in pull request #31646: [SPARK-34538][SQL] Hive Metastore support filter by not-in

2021-03-09 Thread GitBox
ulysses-you commented on a change in pull request #31646: URL: https://github.com/apache/spark/pull/31646#discussion_r591043266 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/FiltersSuite.scala ## @@ -108,6 +108,52 @@ class FiltersSuite extends SparkFun

[GitHub] [spark] ulysses-you commented on a change in pull request #31646: [SPARK-34538][SQL] Hive Metastore support filter by not-in

2021-03-09 Thread GitBox
ulysses-you commented on a change in pull request #31646: URL: https://github.com/apache/spark/pull/31646#discussion_r591041007 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ## @@ -775,14 +791,25 @@ private[client] class Shim_v0_13 exten

[GitHub] [spark] cloud-fan commented on pull request #31771: [SPARK-34652][AVRO] Support SchemaRegistry in from_avro method

2021-03-09 Thread GitBox
cloud-fan commented on pull request #31771: URL: https://github.com/apache/spark/pull/31771#issuecomment-794879550 I think it's a good addition to support the Confluent SR integration, the question is how. Making it built-in looks a bit overkill, putting it in "external" or third-party lib

[GitHub] [spark] eddyxu edited a comment on pull request #31735: [SPARK-34600][Pyspark][SQL] Return User-defined types from Pandas UDF

2021-03-09 Thread GitBox
eddyxu edited a comment on pull request #31735: URL: https://github.com/apache/spark/pull/31735#issuecomment-794844245 Hi, @HyukjinKwon I did some benchmarks using the following code ```python df = self.spark.range(1, 10 ** 8, numPartitions=32) df = df.cache()

[GitHub] [spark] imback82 commented on a change in pull request #31652: [SPARK-34546][SQL] AlterViewAs.query should be analyzed during the analysis phase, and AlterViewAs should invalidate the cache

2021-03-09 Thread GitBox
imback82 commented on a change in pull request #31652: URL: https://github.com/apache/spark/pull/31652#discussion_r591029660 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala ## @@ -596,19 +546,102 @@ object ViewHelper { (collectTemp

[GitHub] [spark] cloud-fan commented on a change in pull request #31646: [SPARK-34538][SQL] Hive Metastore support filter by not-in

2021-03-09 Thread GitBox
cloud-fan commented on a change in pull request #31646: URL: https://github.com/apache/spark/pull/31646#discussion_r591029192 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/FiltersSuite.scala ## @@ -108,6 +108,52 @@ class FiltersSuite extends SparkFunSu

[GitHub] [spark] cloud-fan commented on a change in pull request #31646: [SPARK-34538][SQL] Hive Metastore support filter by not-in

2021-03-09 Thread GitBox
cloud-fan commented on a change in pull request #31646: URL: https://github.com/apache/spark/pull/31646#discussion_r591028614 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ## @@ -775,14 +791,25 @@ private[client] class Shim_v0_13 extends

  1   2   3   4   >