[GitHub] [spark] AmplabJenkins commented on pull request #31462: [SPARK-34347][SQL] CatalogImpl.uncacheTable should invalidate in cascade for temp views

2021-02-04 Thread GitBox
AmplabJenkins commented on pull request #31462: URL: https://github.com/apache/spark/pull/31462#issuecomment-773111769 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39443/ -

[GitHub] [spark] AmplabJenkins commented on pull request #31413: [SPARK-32985][SQL] Decouple bucket scan and bucket filter pruning for data source v1

2021-02-04 Thread GitBox
AmplabJenkins commented on pull request #31413: URL: https://github.com/apache/spark/pull/31413#issuecomment-773111768 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39445/ -

[GitHub] [spark] AmplabJenkins commented on pull request #31467: [SPARK-33212][FOLLOW-UP][BUILD] Uses provided properties for Hadoop client dependencies in root pom

2021-02-04 Thread GitBox
AmplabJenkins commented on pull request #31467: URL: https://github.com/apache/spark/pull/31467#issuecomment-773111772 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39442/ -

[GitHub] [spark] AmplabJenkins commented on pull request #31448: [SPARK-28137][SQL] Data Type Formatting Functions: `to_number`.

2021-02-04 Thread GitBox
AmplabJenkins commented on pull request #31448: URL: https://github.com/apache/spark/pull/31448#issuecomment-773111773 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39441/ -

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31448: [SPARK-28137][SQL] Data Type Formatting Functions: `to_number`.

2021-02-04 Thread GitBox
AmplabJenkins removed a comment on pull request #31448: URL: https://github.com/apache/spark/pull/31448#issuecomment-773111773 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39441/

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31467: [SPARK-33212][FOLLOW-UP][BUILD] Uses provided properties for Hadoop client dependencies in root pom

2021-02-04 Thread GitBox
AmplabJenkins removed a comment on pull request #31467: URL: https://github.com/apache/spark/pull/31467#issuecomment-773111772 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39442/

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31413: [SPARK-32985][SQL] Decouple bucket scan and bucket filter pruning for data source v1

2021-02-04 Thread GitBox
AmplabJenkins removed a comment on pull request #31413: URL: https://github.com/apache/spark/pull/31413#issuecomment-773111768 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39445/

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31462: [SPARK-34347][SQL] CatalogImpl.uncacheTable should invalidate in cascade for temp views

2021-02-04 Thread GitBox
AmplabJenkins removed a comment on pull request #31462: URL: https://github.com/apache/spark/pull/31462#issuecomment-773111769 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39443/

[GitHub] [spark] SparkQA commented on pull request #31464: [SPARK-34339][CORE][SQL] Expose the number of total paths in Utils.buildLocationMetadata()

2021-02-04 Thread GitBox
SparkQA commented on pull request #31464: URL: https://github.com/apache/spark/pull/31464#issuecomment-773112085 **[Test build #134861 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134861/testReport)** for PR 31464 at commit [`e766209`](https://github.com

[GitHub] [spark] zhengruifeng commented on pull request #31468: [SPARK-34353][SQL] CollectLimitExec avoid shuffle if input rdd has single partition

2021-02-04 Thread GitBox
zhengruifeng commented on pull request #31468: URL: https://github.com/apache/spark/pull/31468#issuecomment-773112265 > We may have more operators that adding shuffle in the doExecute method instead of the planner @cloud-fan shuffle only is directly added in the `doExecute` method of

[GitHub] [spark] zhengruifeng commented on pull request #31468: [SPARK-34353][SQL] CollectLimitExec avoid shuffle if input rdd has single partition

2021-02-04 Thread GitBox
zhengruifeng commented on pull request #31468: URL: https://github.com/apache/spark/pull/31468#issuecomment-773112511 related to https://github.com/apache/spark/pull/31409 This is an automated message from the Apache Git Serv

[GitHub] [spark] AngersZhuuuu commented on pull request #31179: [SPARK-34113][SQL] Use metric data update metadata statistic's size and rowCount

2021-02-04 Thread GitBox
AngersZh commented on pull request #31179: URL: https://github.com/apache/spark/pull/31179#issuecomment-773113604 Gentle ping @cloud-fan @viirya This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] SparkQA commented on pull request #31448: [SPARK-28137][SQL] Data Type Formatting Functions: `to_number`.

2021-02-04 Thread GitBox
SparkQA commented on pull request #31448: URL: https://github.com/apache/spark/pull/31448#issuecomment-773114140 **[Test build #134854 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134854/testReport)** for PR 31448 at commit [`f688748`](https://github.co

[GitHub] [spark] AmplabJenkins commented on pull request #31448: [SPARK-28137][SQL] Data Type Formatting Functions: `to_number`.

2021-02-04 Thread GitBox
AmplabJenkins commented on pull request #31448: URL: https://github.com/apache/spark/pull/31448#issuecomment-773114893 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134854/ -

[GitHub] [spark] SparkQA removed a comment on pull request #31448: [SPARK-28137][SQL] Data Type Formatting Functions: `to_number`.

2021-02-04 Thread GitBox
SparkQA removed a comment on pull request #31448: URL: https://github.com/apache/spark/pull/31448#issuecomment-773036893 **[Test build #134854 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134854/testReport)** for PR 31448 at commit [`f688748`](https://gi

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31448: [SPARK-28137][SQL] Data Type Formatting Functions: `to_number`.

2021-02-04 Thread GitBox
AmplabJenkins removed a comment on pull request #31448: URL: https://github.com/apache/spark/pull/31448#issuecomment-773114893 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134854/ -

[GitHub] [spark] SparkQA commented on pull request #31394: [SPARK-34291][ML] LSH hashDistance optimization

2021-02-04 Thread GitBox
SparkQA commented on pull request #31394: URL: https://github.com/apache/spark/pull/31394#issuecomment-773118982 **[Test build #134862 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134862/testReport)** for PR 31394 at commit [`f8b29dd`](https://github.com

[GitHub] [spark] SparkQA commented on pull request #31468: [SPARK-34353][SQL] CollectLimitExec avoid shuffle if input rdd has single partition

2021-02-04 Thread GitBox
SparkQA commented on pull request #31468: URL: https://github.com/apache/spark/pull/31468#issuecomment-773118983 **[Test build #134860 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134860/testReport)** for PR 31468 at commit [`6d199ff`](https://github.com

[GitHub] [spark] SparkQA commented on pull request #31466: [WIP][SPARK-34352][SQL] Improve SQLQueryTestSuite so as could run on windows system

2021-02-04 Thread GitBox
SparkQA commented on pull request #31466: URL: https://github.com/apache/spark/pull/31466#issuecomment-773122521 **[Test build #134852 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134852/testReport)** for PR 31466 at commit [`5087499`](https://github.co

[GitHub] [spark] SparkQA removed a comment on pull request #31466: [WIP][SPARK-34352][SQL] Improve SQLQueryTestSuite so as could run on windows system

2021-02-04 Thread GitBox
SparkQA removed a comment on pull request #31466: URL: https://github.com/apache/spark/pull/31466#issuecomment-773016444 **[Test build #134852 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134852/testReport)** for PR 31466 at commit [`5087499`](https://gi

[GitHub] [spark] SparkQA commented on pull request #31460: [SPARK-34346][CORE][SQL] io.file.buffer.size set by spark.buffer.size will override by loading hive-site.xml accidentally may cause perf regr

2021-02-04 Thread GitBox
SparkQA commented on pull request #31460: URL: https://github.com/apache/spark/pull/31460#issuecomment-773124520 **[Test build #134856 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134856/testReport)** for PR 31460 at commit [`58a532e`](https://github.co

[GitHub] [spark] SparkQA commented on pull request #31467: [SPARK-33212][FOLLOW-UP][BUILD] Uses provided properties for Hadoop client dependencies in root pom

2021-02-04 Thread GitBox
SparkQA commented on pull request #31467: URL: https://github.com/apache/spark/pull/31467#issuecomment-773124991 **[Test build #134855 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134855/testReport)** for PR 31467 at commit [`03eec64`](https://github.co

[GitHub] [spark] SparkQA removed a comment on pull request #31467: [SPARK-33212][FOLLOW-UP][BUILD] Uses provided properties for Hadoop client dependencies in root pom

2021-02-04 Thread GitBox
SparkQA removed a comment on pull request #31467: URL: https://github.com/apache/spark/pull/31467#issuecomment-773056148 **[Test build #134855 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134855/testReport)** for PR 31467 at commit [`03eec64`](https://gi

[GitHub] [spark] SparkQA removed a comment on pull request #31460: [SPARK-34346][CORE][SQL] io.file.buffer.size set by spark.buffer.size will override by loading hive-site.xml accidentally may cause p

2021-02-04 Thread GitBox
SparkQA removed a comment on pull request #31460: URL: https://github.com/apache/spark/pull/31460#issuecomment-773056155 **[Test build #134856 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134856/testReport)** for PR 31460 at commit [`58a532e`](https://gi

[GitHub] [spark] cloud-fan commented on pull request #31460: [SPARK-34346][CORE][SQL] io.file.buffer.size set by spark.buffer.size will override by loading hive-site.xml accidentally may cause perf re

2021-02-04 Thread GitBox
cloud-fan commented on pull request #31460: URL: https://github.com/apache/spark/pull/31460#issuecomment-773127405 Hi @dongjoon-hyun , do you have more concerns about this fix? This is an automated message from the Apache Git

[GitHub] [spark] cloud-fan commented on pull request #31468: [SPARK-34353][SQL] CollectLimitExec avoid shuffle if input rdd has single partition

2021-02-04 Thread GitBox
cloud-fan commented on pull request #31468: URL: https://github.com/apache/spark/pull/31468#issuecomment-773128349 cc @viirya @maropu This is an automated message from the Apache Git Service. To respond to the message, pleas

[GitHub] [spark] SparkQA commented on pull request #31448: [SPARK-28137][SQL] Data Type Formatting Functions: `to_number`.

2021-02-04 Thread GitBox
SparkQA commented on pull request #31448: URL: https://github.com/apache/spark/pull/31448#issuecomment-773128492 **[Test build #134863 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134863/testReport)** for PR 31448 at commit [`ea11c3c`](https://github.com

[GitHub] [spark] HyukjinKwon commented on pull request #31467: [SPARK-33212][FOLLOW-UP][BUILD] Uses provided properties for Hadoop client dependencies in root pom

2021-02-04 Thread GitBox
HyukjinKwon commented on pull request #31467: URL: https://github.com/apache/spark/pull/31467#issuecomment-773128556 Merged to master. This is an automated message from the Apache Git Service. To respond to the message, pleas

[GitHub] [spark] MaxGekk commented on a change in pull request #31460: [SPARK-34346][CORE][SQL] io.file.buffer.size set by spark.buffer.size will override by loading hive-site.xml accidentally may cau

2021-02-04 Thread GitBox
MaxGekk commented on a change in pull request #31460: URL: https://github.com/apache/spark/pull/31460#discussion_r570032069 ## File path: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala ## @@ -450,13 +450,22 @@ private[spark] object SparkHadoopUtil {

[GitHub] [spark] HyukjinKwon closed pull request #31467: [SPARK-33212][FOLLOW-UP][BUILD] Uses provided properties for Hadoop client dependencies in root pom

2021-02-04 Thread GitBox
HyukjinKwon closed pull request #31467: URL: https://github.com/apache/spark/pull/31467 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] cloud-fan commented on pull request #31179: [SPARK-34113][SQL] Use metric data update metadata statistic's size and rowCount

2021-02-04 Thread GitBox
cloud-fan commented on pull request #31179: URL: https://github.com/apache/spark/pull/31179#issuecomment-773131159 how big is the overhead? I had an impression that auto stats update is very expensive and not many people are using it...

[GitHub] [spark] zhengruifeng opened a new pull request #31469: [MINOR][ML] Param Validation should throw IllegalArgumentException

2021-02-04 Thread GitBox
zhengruifeng opened a new pull request #31469: URL: https://github.com/apache/spark/pull/31469 ### What changes were proposed in this pull request? Param Validation throw `IllegalArgumentException` ### Why are the changes needed? Param Validation should throw `IllegalArgumen

[GitHub] [spark] zhengruifeng commented on pull request #31469: [MINOR][ML] Param Validation should throw IllegalArgumentException

2021-02-04 Thread GitBox
zhengruifeng commented on pull request #31469: URL: https://github.com/apache/spark/pull/31469#issuecomment-773133202 ping @huaxingao @srowen This is an automated message from the Apache Git Service. To respond to the messag

[GitHub] [spark] zhengruifeng opened a new pull request #29185: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

2021-02-04 Thread GitBox
zhengruifeng opened a new pull request #29185: URL: https://github.com/apache/spark/pull/29185 ### What changes were proposed in this pull request? avoid unnecessary shuffle if possible ### Why are the changes needed? In `combineByKeyWithClassTag`, there is a check to avoid unne

[GitHub] [spark] zhengruifeng commented on pull request #29185: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

2021-02-04 Thread GitBox
zhengruifeng commented on pull request #29185: URL: https://github.com/apache/spark/pull/29185#issuecomment-773135060 retest this please This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [spark] SparkQA commented on pull request #31464: [SPARK-34339][CORE][SQL] Expose the number of total paths in Utils.buildLocationMetadata()

2021-02-04 Thread GitBox
SparkQA commented on pull request #31464: URL: https://github.com/apache/spark/pull/31464#issuecomment-773137525 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39448/ -

[GitHub] [spark] beliefer commented on pull request #31466: [SPARK-34352][SQL] Improve SQLQueryTestSuite so as could run on windows system

2021-02-04 Thread GitBox
beliefer commented on pull request #31466: URL: https://github.com/apache/spark/pull/31466#issuecomment-773140157 cc @cloud-fan @wangyum @maropu This is an automated message from the Apache Git Service. To respond to the mes

[GitHub] [spark] SparkQA commented on pull request #31394: [SPARK-34291][ML] LSH hashDistance optimization

2021-02-04 Thread GitBox
SparkQA commented on pull request #31394: URL: https://github.com/apache/spark/pull/31394#issuecomment-773140843 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39449/ -

[GitHub] [spark] AngersZhuuuu commented on pull request #30957: [SPARK-31937][SQL] Support processing ArrayType/MapType/StructType data using no-serde mode script transform

2021-02-04 Thread GitBox
AngersZh commented on pull request #30957: URL: https://github.com/apache/spark/pull/30957#issuecomment-773141579 Gentle ping @maropu @cloud-fan Sorry for my late reply. Now I have change it to use json format. And it work well. Since there are two function `StructToJson` and `JsonTo

[GitHub] [spark] AngersZhuuuu edited a comment on pull request #30957: [SPARK-31937][SQL] Support processing ArrayType/MapType/StructType data using no-serde mode script transform

2021-02-04 Thread GitBox
AngersZh edited a comment on pull request #30957: URL: https://github.com/apache/spark/pull/30957#issuecomment-773141579 Gentle ping @maropu @cloud-fan Sorry for my late reply. Now I have change it to use json format. And it work well. Since there are two function `StructToJson` and

[GitHub] [spark] AngersZhuuuu commented on pull request #29087: [SPARK-28227][SQL] Support projection, aggregate/window functions, and lateral view in the TRANSFORM clause

2021-02-04 Thread GitBox
AngersZh commented on pull request #29087: URL: https://github.com/apache/spark/pull/29087#issuecomment-773142558 ping @HyukjinKwon @cloud-fan @viirya Can you take a look. This is an automated message from the Apache Git

[GitHub] [spark] AngersZhuuuu edited a comment on pull request #29087: [SPARK-28227][SQL] Support projection, aggregate/window functions, and lateral view in the TRANSFORM clause

2021-02-04 Thread GitBox
AngersZh edited a comment on pull request #29087: URL: https://github.com/apache/spark/pull/29087#issuecomment-773142558 ping @HyukjinKwon @cloud-fan @viirya Can you take a look about this pr's current change. This is an

[GitHub] [spark] AmplabJenkins commented on pull request #31258: [SPARK-34168] [SQL] Support DPP in AQE when the join is Broadcast hash join at the beginning

2021-02-04 Thread GitBox
AmplabJenkins commented on pull request #31258: URL: https://github.com/apache/spark/pull/31258#issuecomment-773146054 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39446/ -

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31467: [SPARK-33212][FOLLOW-UP][BUILD] Uses provided properties for Hadoop client dependencies in root pom

2021-02-04 Thread GitBox
AmplabJenkins removed a comment on pull request #31467: URL: https://github.com/apache/spark/pull/31467#issuecomment-773146051 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134855/ -

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31460: [SPARK-34346][CORE][SQL] io.file.buffer.size set by spark.buffer.size will override by loading hive-site.xml accidentally may c

2021-02-04 Thread GitBox
AmplabJenkins removed a comment on pull request #31460: URL: https://github.com/apache/spark/pull/31460#issuecomment-773146057 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134856/ -

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31258: [SPARK-34168] [SQL] Support DPP in AQE when the join is Broadcast hash join at the beginning

2021-02-04 Thread GitBox
AmplabJenkins removed a comment on pull request #31258: URL: https://github.com/apache/spark/pull/31258#issuecomment-773146054 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39446/

[GitHub] [spark] AmplabJenkins commented on pull request #31467: [SPARK-33212][FOLLOW-UP][BUILD] Uses provided properties for Hadoop client dependencies in root pom

2021-02-04 Thread GitBox
AmplabJenkins commented on pull request #31467: URL: https://github.com/apache/spark/pull/31467#issuecomment-773146051 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134855/ -

[GitHub] [spark] AmplabJenkins commented on pull request #31466: [SPARK-34352][SQL] Improve SQLQueryTestSuite so as could run on windows system

2021-02-04 Thread GitBox
AmplabJenkins commented on pull request #31466: URL: https://github.com/apache/spark/pull/31466#issuecomment-773146059 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134852/ -

[GitHub] [spark] AmplabJenkins commented on pull request #31460: [SPARK-34346][CORE][SQL] io.file.buffer.size set by spark.buffer.size will override by loading hive-site.xml accidentally may cause per

2021-02-04 Thread GitBox
AmplabJenkins commented on pull request #31460: URL: https://github.com/apache/spark/pull/31460#issuecomment-773146057 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134856/ -

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31466: [SPARK-34352][SQL] Improve SQLQueryTestSuite so as could run on windows system

2021-02-04 Thread GitBox
AmplabJenkins removed a comment on pull request #31466: URL: https://github.com/apache/spark/pull/31466#issuecomment-773146059 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134852/ -

[GitHub] [spark] SparkQA commented on pull request #31448: [SPARK-28137][SQL] Data Type Formatting Functions: `to_number`.

2021-02-04 Thread GitBox
SparkQA commented on pull request #31448: URL: https://github.com/apache/spark/pull/31448#issuecomment-773147942 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39450/ -

[GitHub] [spark] SparkQA commented on pull request #31469: [MINOR][ML] Param Validation should throw IllegalArgumentException

2021-02-04 Thread GitBox
SparkQA commented on pull request #31469: URL: https://github.com/apache/spark/pull/31469#issuecomment-773148826 **[Test build #134864 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134864/testReport)** for PR 31469 at commit [`40a9a53`](https://github.com

[GitHub] [spark] SparkQA commented on pull request #31468: [SPARK-34353][SQL] CollectLimitExec avoid shuffle if input rdd has single partition

2021-02-04 Thread GitBox
SparkQA commented on pull request #31468: URL: https://github.com/apache/spark/pull/31468#issuecomment-773148878 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39447/ -

[GitHub] [spark] SparkQA commented on pull request #30957: [SPARK-31937][SQL] Support processing ArrayType/MapType/StructType data using no-serde mode script transform

2021-02-04 Thread GitBox
SparkQA commented on pull request #30957: URL: https://github.com/apache/spark/pull/30957#issuecomment-773150220 **[Test build #134865 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134865/testReport)** for PR 30957 at commit [`b631b70`](https://github.com

[GitHub] [spark] AngersZhuuuu commented on pull request #31179: [SPARK-34113][SQL] Use metric data update metadata statistic's size and rowCount

2021-02-04 Thread GitBox
AngersZh commented on pull request #31179: URL: https://github.com/apache/spark/pull/31179#issuecomment-773151680 > how big is the overhead? I had an impression that auto stats update is very expensive and not many people are using it... In origin way. 1. We just update si

[GitHub] [spark] Ngone51 opened a new pull request #31470: [SPARK-34354][SQL] Fix failure when apply CostBasedJoinReorder on self-join

2021-02-04 Thread GitBox
Ngone51 opened a new pull request #31470: URL: https://github.com/apache/spark/pull/31470 ### What changes were proposed in this pull request? This PR introduces a new analysis rule `DeduplicateRelations`, which deduplicates any duplicate relations in a plan first and the

[GitHub] [spark] Ngone51 commented on pull request #31470: [SPARK-34354][SQL] Fix failure when apply CostBasedJoinReorder on self-join

2021-02-04 Thread GitBox
Ngone51 commented on pull request #31470: URL: https://github.com/apache/spark/pull/31470#issuecomment-773151888 cc @cloud-fan This is an automated message from the Apache Git Service. To respond to the message, please log o

[GitHub] [spark] SparkQA commented on pull request #29185: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

2021-02-04 Thread GitBox
SparkQA commented on pull request #29185: URL: https://github.com/apache/spark/pull/29185#issuecomment-773152387 **[Test build #134866 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134866/testReport)** for PR 29185 at commit [`cb1148c`](https://github.com

[GitHub] [spark] AmplabJenkins commented on pull request #31464: [SPARK-34339][CORE][SQL] Expose the number of total paths in Utils.buildLocationMetadata()

2021-02-04 Thread GitBox
AmplabJenkins commented on pull request #31464: URL: https://github.com/apache/spark/pull/31464#issuecomment-773155309 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39448/ -

[GitHub] [spark] SparkQA commented on pull request #31464: [SPARK-34339][CORE][SQL] Expose the number of total paths in Utils.buildLocationMetadata()

2021-02-04 Thread GitBox
SparkQA commented on pull request #31464: URL: https://github.com/apache/spark/pull/31464#issuecomment-773155274 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39448/ ---

[GitHub] [spark] SparkQA commented on pull request #31394: [SPARK-34291][ML] LSH hashDistance optimization

2021-02-04 Thread GitBox
SparkQA commented on pull request #31394: URL: https://github.com/apache/spark/pull/31394#issuecomment-773155354 **[Test build #134862 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134862/testReport)** for PR 31394 at commit [`f8b29dd`](https://github.co

[GitHub] [spark] SparkQA commented on pull request #31470: [SPARK-34354][SQL] Fix failure when apply CostBasedJoinReorder on self-join

2021-02-04 Thread GitBox
SparkQA commented on pull request #31470: URL: https://github.com/apache/spark/pull/31470#issuecomment-773155684 **[Test build #134867 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134867/testReport)** for PR 31470 at commit [`c09ab12`](https://github.com

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31464: [SPARK-34339][CORE][SQL] Expose the number of total paths in Utils.buildLocationMetadata()

2021-02-04 Thread GitBox
AmplabJenkins removed a comment on pull request #31464: URL: https://github.com/apache/spark/pull/31464#issuecomment-773155309 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39448/

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31394: [SPARK-34291][ML] LSH hashDistance optimization

2021-02-04 Thread GitBox
AmplabJenkins removed a comment on pull request #31394: URL: https://github.com/apache/spark/pull/31394#issuecomment-773155810 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134862/ -

[GitHub] [spark] AmplabJenkins commented on pull request #31394: [SPARK-34291][ML] LSH hashDistance optimization

2021-02-04 Thread GitBox
AmplabJenkins commented on pull request #31394: URL: https://github.com/apache/spark/pull/31394#issuecomment-773155810 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134862/ -

[GitHub] [spark] SparkQA removed a comment on pull request #31394: [SPARK-34291][ML] LSH hashDistance optimization

2021-02-04 Thread GitBox
SparkQA removed a comment on pull request #31394: URL: https://github.com/apache/spark/pull/31394#issuecomment-773118982 **[Test build #134862 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134862/testReport)** for PR 31394 at commit [`f8b29dd`](https://gi

[GitHub] [spark] SparkQA commented on pull request #31394: [SPARK-34291][ML] LSH hashDistance optimization

2021-02-04 Thread GitBox
SparkQA commented on pull request #31394: URL: https://github.com/apache/spark/pull/31394#issuecomment-773157278 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39449/ ---

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31394: [SPARK-34291][ML] LSH hashDistance optimization

2021-02-04 Thread GitBox
AmplabJenkins removed a comment on pull request #31394: URL: https://github.com/apache/spark/pull/31394#issuecomment-773157804 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39449/

[GitHub] [spark] AmplabJenkins commented on pull request #31394: [SPARK-34291][ML] LSH hashDistance optimization

2021-02-04 Thread GitBox
AmplabJenkins commented on pull request #31394: URL: https://github.com/apache/spark/pull/31394#issuecomment-773157804 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39449/ -

[GitHub] [spark] ulysses-you opened a new pull request #31471: [SPARK-34355][SQL] Add log and time cost for commit job

2021-02-04 Thread GitBox
ulysses-you opened a new pull request #31471: URL: https://github.com/apache/spark/pull/31471 ### What changes were proposed in this pull request? Add some info log around commit log. ### Why are the changes needed? Th commit job is a heavy option and we have see

[GitHub] [spark] SparkQA commented on pull request #31448: [SPARK-28137][SQL] Data Type Formatting Functions: `to_number`.

2021-02-04 Thread GitBox
SparkQA commented on pull request #31448: URL: https://github.com/apache/spark/pull/31448#issuecomment-773166476 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39450/ ---

[GitHub] [spark] AmplabJenkins commented on pull request #31448: [SPARK-28137][SQL] Data Type Formatting Functions: `to_number`.

2021-02-04 Thread GitBox
AmplabJenkins commented on pull request #31448: URL: https://github.com/apache/spark/pull/31448#issuecomment-773185539 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39450/ -

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31448: [SPARK-28137][SQL] Data Type Formatting Functions: `to_number`.

2021-02-04 Thread GitBox
AmplabJenkins removed a comment on pull request #31448: URL: https://github.com/apache/spark/pull/31448#issuecomment-773185539 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39450/

[GitHub] [spark] SparkQA commented on pull request #29185: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

2021-02-04 Thread GitBox
SparkQA commented on pull request #29185: URL: https://github.com/apache/spark/pull/29185#issuecomment-773186615 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39454/ -

[GitHub] [spark] LuciferYang commented on a change in pull request #31471: [SPARK-34355][SQL] Add log and time cost for commit job

2021-02-04 Thread GitBox
LuciferYang commented on a change in pull request #31471: URL: https://github.com/apache/spark/pull/31471#discussion_r570096278 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala ## @@ -217,8 +217,11 @@ object FileFormatWrit

[GitHub] [spark] SparkQA commented on pull request #31471: [SPARK-34355][SQL] Add log and time cost for commit job

2021-02-04 Thread GitBox
SparkQA commented on pull request #31471: URL: https://github.com/apache/spark/pull/31471#issuecomment-773187923 **[Test build #134868 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134868/testReport)** for PR 31471 at commit [`def94f1`](https://github.com

[GitHub] [spark] SparkQA commented on pull request #31469: [MINOR][ML] Param Validation should throw IllegalArgumentException

2021-02-04 Thread GitBox
SparkQA commented on pull request #31469: URL: https://github.com/apache/spark/pull/31469#issuecomment-773188484 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39452/ -

[GitHub] [spark] SparkQA commented on pull request #31469: [MINOR][ML] Param Validation should throw IllegalArgumentException

2021-02-04 Thread GitBox
SparkQA commented on pull request #31469: URL: https://github.com/apache/spark/pull/31469#issuecomment-773188631 **[Test build #134864 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134864/testReport)** for PR 31469 at commit [`40a9a53`](https://github.co

[GitHub] [spark] SparkQA removed a comment on pull request #31469: [MINOR][ML] Param Validation should throw IllegalArgumentException

2021-02-04 Thread GitBox
SparkQA removed a comment on pull request #31469: URL: https://github.com/apache/spark/pull/31469#issuecomment-773148826 **[Test build #134864 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134864/testReport)** for PR 31469 at commit [`40a9a53`](https://gi

[GitHub] [spark] AmplabJenkins commented on pull request #31469: [MINOR][ML] Param Validation should throw IllegalArgumentException

2021-02-04 Thread GitBox
AmplabJenkins commented on pull request #31469: URL: https://github.com/apache/spark/pull/31469#issuecomment-773189276 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134864/ -

[GitHub] [spark] SparkQA commented on pull request #31470: [SPARK-34354][SQL] Fix failure when apply CostBasedJoinReorder on self-join

2021-02-04 Thread GitBox
SparkQA commented on pull request #31470: URL: https://github.com/apache/spark/pull/31470#issuecomment-773189883 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39451/ -

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31469: [MINOR][ML] Param Validation should throw IllegalArgumentException

2021-02-04 Thread GitBox
AmplabJenkins removed a comment on pull request #31469: URL: https://github.com/apache/spark/pull/31469#issuecomment-773189276 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134864/ -

[GitHub] [spark] SparkQA commented on pull request #29185: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

2021-02-04 Thread GitBox
SparkQA commented on pull request #29185: URL: https://github.com/apache/spark/pull/29185#issuecomment-773195188 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39454/ ---

[GitHub] [spark] SparkQA commented on pull request #31413: [SPARK-32985][SQL] Decouple bucket scan and bucket filter pruning for data source v1

2021-02-04 Thread GitBox
SparkQA commented on pull request #31413: URL: https://github.com/apache/spark/pull/31413#issuecomment-773196020 **[Test build #134857 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134857/testReport)** for PR 31413 at commit [`066d5a4`](https://github.co

[GitHub] [spark] SparkQA removed a comment on pull request #31413: [SPARK-32985][SQL] Decouple bucket scan and bucket filter pruning for data source v1

2021-02-04 Thread GitBox
SparkQA removed a comment on pull request #31413: URL: https://github.com/apache/spark/pull/31413#issuecomment-773056185 **[Test build #134857 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134857/testReport)** for PR 31413 at commit [`066d5a4`](https://gi

[GitHub] [spark] zhengruifeng opened a new pull request #31472: [SPARK-34356][ML] OVR transform fix potential column conflict

2021-02-04 Thread GitBox
zhengruifeng opened a new pull request #31472: URL: https://github.com/apache/spark/pull/31472 ### What changes were proposed in this pull request? 1, clear predictionCol & probabilityCol, use tmp rawPred col, to avoid potential column conflict; 2, use array instead of map, to keep in

[GitHub] [spark] SparkQA commented on pull request #31464: [SPARK-34339][CORE][SQL] Expose the number of total paths in Utils.buildLocationMetadata()

2021-02-04 Thread GitBox
SparkQA commented on pull request #31464: URL: https://github.com/apache/spark/pull/31464#issuecomment-773198283 **[Test build #134861 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134861/testReport)** for PR 31464 at commit [`e766209`](https://github.co

[GitHub] [spark] zhengruifeng commented on a change in pull request #31472: [SPARK-34356][ML] OVR transform fix potential column conflict

2021-02-04 Thread GitBox
zhengruifeng commented on a change in pull request #31472: URL: https://github.com/apache/spark/pull/31472#discussion_r570108063 ## File path: mllib/src/test/scala/org/apache/spark/ml/classification/OneVsRestSuite.scala ## @@ -223,6 +223,13 @@ class OneVsRestSuite extends MLTe

[GitHub] [spark] zhengruifeng commented on a change in pull request #31472: [SPARK-34356][ML] OVR transform fix potential column conflict

2021-02-04 Thread GitBox
zhengruifeng commented on a change in pull request #31472: URL: https://github.com/apache/spark/pull/31472#discussion_r570108063 ## File path: mllib/src/test/scala/org/apache/spark/ml/classification/OneVsRestSuite.scala ## @@ -223,6 +223,13 @@ class OneVsRestSuite extends MLTe

[GitHub] [spark] SparkQA removed a comment on pull request #31464: [SPARK-34339][CORE][SQL] Expose the number of total paths in Utils.buildLocationMetadata()

2021-02-04 Thread GitBox
SparkQA removed a comment on pull request #31464: URL: https://github.com/apache/spark/pull/31464#issuecomment-773112085 **[Test build #134861 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134861/testReport)** for PR 31464 at commit [`e766209`](https://gi

[GitHub] [spark] zhengruifeng commented on a change in pull request #31472: [SPARK-34356][ML] OVR transform fix potential column conflict

2021-02-04 Thread GitBox
zhengruifeng commented on a change in pull request #31472: URL: https://github.com/apache/spark/pull/31472#discussion_r570108063 ## File path: mllib/src/test/scala/org/apache/spark/ml/classification/OneVsRestSuite.scala ## @@ -223,6 +223,13 @@ class OneVsRestSuite extends MLTe

[GitHub] [spark] SparkQA commented on pull request #31468: [SPARK-34353][SQL] CollectLimitExec avoid shuffle if input rdd has single partition

2021-02-04 Thread GitBox
SparkQA commented on pull request #31468: URL: https://github.com/apache/spark/pull/31468#issuecomment-773199615 **[Test build #134860 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134860/testReport)** for PR 31468 at commit [`6d199ff`](https://github.co

[GitHub] [spark] SparkQA removed a comment on pull request #31468: [SPARK-34353][SQL] CollectLimitExec avoid shuffle if input rdd has single partition

2021-02-04 Thread GitBox
SparkQA removed a comment on pull request #31468: URL: https://github.com/apache/spark/pull/31468#issuecomment-773118983 **[Test build #134860 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134860/testReport)** for PR 31468 at commit [`6d199ff`](https://gi

[GitHub] [spark] SparkQA commented on pull request #31470: [SPARK-34354][SQL] Fix failure when apply CostBasedJoinReorder on self-join

2021-02-04 Thread GitBox
SparkQA commented on pull request #31470: URL: https://github.com/apache/spark/pull/31470#issuecomment-773201590 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39451/ ---

[GitHub] [spark] zhengruifeng commented on pull request #31472: [SPARK-34356][ML] OVR transform fix potential column conflict

2021-02-04 Thread GitBox
zhengruifeng commented on pull request #31472: URL: https://github.com/apache/spark/pull/31472#issuecomment-773203630 ``` scala> val df = spark.read.format("libsvm").load("/d0/Dev/Opensource/spark/data/mllib/sample_multiclass_classification_data.txt").withColumn("probability", lit(0.0))

[GitHub] [spark] zhengruifeng edited a comment on pull request #31472: [SPARK-34356][ML] OVR transform fix potential column conflict

2021-02-04 Thread GitBox
zhengruifeng edited a comment on pull request #31472: URL: https://github.com/apache/spark/pull/31472#issuecomment-773203630 in 3.0.1 and master ``` scala> val df = spark.read.format("libsvm").load("/d0/Dev/Opensource/spark/data/mllib/sample_multiclass_classification_data.txt").withCo

[GitHub] [spark] SparkQA commented on pull request #31468: [SPARK-34353][SQL] CollectLimitExec avoid shuffle if input rdd has single partition

2021-02-04 Thread GitBox
SparkQA commented on pull request #31468: URL: https://github.com/apache/spark/pull/31468#issuecomment-773207218 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39447/ ---

[GitHub] [spark] gaborgsomogyi commented on a change in pull request #31384: [SPARK-31816][SQL][DOCS] Added high level description about JDBC connection providers for users/developers

2021-02-04 Thread GitBox
gaborgsomogyi commented on a change in pull request #31384: URL: https://github.com/apache/spark/pull/31384#discussion_r570119198 ## File path: sql/core/src/main/scala/org/apache/spark/sql/jdbc/README.md ## @@ -0,0 +1,81 @@ +--- +license: | + Licensed to the Apache Software Fo

[GitHub] [spark] SparkQA commented on pull request #31469: [MINOR][ML] Param Validation should throw IllegalArgumentException

2021-02-04 Thread GitBox
SparkQA commented on pull request #31469: URL: https://github.com/apache/spark/pull/31469#issuecomment-773209423 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39452/ ---

[GitHub] [spark] AmplabJenkins commented on pull request #31469: [MINOR][ML] Param Validation should throw IllegalArgumentException

2021-02-04 Thread GitBox
AmplabJenkins commented on pull request #31469: URL: https://github.com/apache/spark/pull/31469#issuecomment-773222380 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39452/ -

  1   2   3   4   5   6   7   8   >