[GitHub] [spark] maropu commented on pull request #31677: [SPARK-34565][SQL] Collapse Window nodes with Project between them

2021-06-14 Thread GitBox
maropu commented on pull request #31677: URL: https://github.com/apache/spark/pull/31677#issuecomment-859231193 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries

[GitHub] [spark] SparkQA commented on pull request #28032: [SPARK-31264][SQL] Repartition by dynamic partition columns before insert partition table

2021-06-14 Thread GitBox
SparkQA commented on pull request #28032: URL: https://github.com/apache/spark/pull/28032#issuecomment-859924616 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For

[GitHub] [spark] dongjoon-hyun edited a comment on pull request #32750: [SPARK-34512][BUILD][SQL] Upgrade built-in Hive to 2.3.9

2021-06-14 Thread GitBox
dongjoon-hyun edited a comment on pull request #32750: URL: https://github.com/apache/spark/pull/32750#issuecomment-859240913 Merged to master for Apache Spark 3.2.0. Thank you so much, @wangyum , @srowen and @sunchao ! -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] SparkQA commented on pull request #32878: [SPARK-35719][SQL] Support casting of timestamp to timestamp without time zone type

2021-06-14 Thread GitBox
SparkQA commented on pull request #32878: URL: https://github.com/apache/spark/pull/32878#issuecomment-859387984 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For

[GitHub] [spark] github-actions[bot] commented on pull request #31684: [SPARK-34571][SQL][CORE] Provide a more convenient way to deprecate/remove/alternate configs

2021-06-14 Thread GitBox
github-actions[bot] commented on pull request #31684: URL: https://github.com/apache/spark/pull/31684#issuecomment-859964483 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue

[GitHub] [spark] ulysses-you commented on pull request #32883: [SPARK-35725][SQL] Support repartition expand partitions in AQE

2021-06-14 Thread GitBox
ulysses-you commented on pull request #32883: URL: https://github.com/apache/spark/pull/32883#issuecomment-859440312 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32886: [SPARK-35478][PYTHON] Enable disallow_untyped_defs mypy check for pyspark.pandas.window.

2021-06-14 Thread GitBox
AmplabJenkins removed a comment on pull request #32886: URL: https://github.com/apache/spark/pull/32886#issuecomment-859923038 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on pull request #32865: [SPARK-35701][SQL] Use copy-on-write semantics for SQLConf registered configurations.

2021-06-14 Thread GitBox
HyukjinKwon commented on pull request #32865: URL: https://github.com/apache/spark/pull/32865#issuecomment-859229525 BTW, seems like the GA tests were not triggered for some reasons .. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] gengliangwang commented on pull request #32878: [SPARK-35719][SQL] Support casting of timestamp to timestamp without time zone type

2021-06-14 Thread GitBox
gengliangwang commented on pull request #32878: URL: https://github.com/apache/spark/pull/32878#issuecomment-859597469 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28032: [SPARK-31264][SQL] Repartition by dynamic partition columns before insert partition table

2021-06-14 Thread GitBox
AmplabJenkins removed a comment on pull request #28032: URL: https://github.com/apache/spark/pull/28032#issuecomment-859927429 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan commented on pull request #32873: [SPARK-35718][SQL] Support casting of Date to timestamp without time zone type

2021-06-14 Thread GitBox
cloud-fan commented on pull request #32873: URL: https://github.com/apache/spark/pull/32873#issuecomment-859283953 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32470: [SPARK-35712][SQL] Simplify ResolveAggregateFunctions

2021-06-14 Thread GitBox
AmplabJenkins removed a comment on pull request #32470: URL: https://github.com/apache/spark/pull/32470#issuecomment-859276195 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] Yikun commented on pull request #32821: [SPARK-35342][PYTHON] Introduce DecimalOps and make `isnull` method data-type-based

2021-06-14 Thread GitBox
Yikun commented on pull request #32821: URL: https://github.com/apache/spark/pull/32821#issuecomment-859239524 Our goal is `make isnull method data-type-based`, but we should introduce DecimalOps first. Actually, there are 2 tasks in this pr, so I change update the PR title. If

[GitHub] [spark] SparkQA commented on pull request #32473: [SPARK-35345][SQL] Add Parquet tests to BloomFilterBenchmark

2021-06-14 Thread GitBox
SparkQA commented on pull request #32473: URL: https://github.com/apache/spark/pull/32473#issuecomment-859321357 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For

[GitHub] [spark] SparkQA commented on pull request #32845: [SPARK-35691][CORE] addFile/addJar/addDirectory should put CanonicalFile

2021-06-14 Thread GitBox
SparkQA commented on pull request #32845: URL: https://github.com/apache/spark/pull/32845#issuecomment-859219112 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For

[GitHub] [spark] SparkQA removed a comment on pull request #32858: [SPARK-35706][SQL] Consider making the ':' in STRUCT data type definition optional

2021-06-14 Thread GitBox
SparkQA removed a comment on pull request #32858: URL: https://github.com/apache/spark/pull/32858#issuecomment-859524331 **[Test build #139698 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139698/testReport)** for PR 32858 at commit

[GitHub] [spark] SparkQA removed a comment on pull request #32881: [SPARK-33298][CORE] Decouple file naming from FileCommitProtocol

2021-06-14 Thread GitBox
SparkQA removed a comment on pull request #32881: URL: https://github.com/apache/spark/pull/32881#issuecomment-859517395 **[Test build #139693 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139693/testReport)** for PR 32881 at commit

[GitHub] [spark] SparkQA commented on pull request #32852: [SPARK-35283][SQL] Support query some DDL with CTES

2021-06-14 Thread GitBox
SparkQA commented on pull request #32852: URL: https://github.com/apache/spark/pull/32852#issuecomment-859217801 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For

[GitHub] [spark] sigmod opened a new pull request #32882: [WIP][SPARK-35724][SQL] Support traversal pruning in extendedResolutionRules and postHocResolutionRules

2021-06-14 Thread GitBox
sigmod opened a new pull request #32882: URL: https://github.com/apache/spark/pull/32882 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was

[GitHub] [spark] SparkQA commented on pull request #32810: [SPARK-35672][CORE][YARN] Pass user classpath entries to executors using config instead of command line.

2021-06-14 Thread GitBox
SparkQA commented on pull request #32810: URL: https://github.com/apache/spark/pull/32810#issuecomment-859945339 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For

[GitHub] [spark] cloud-fan commented on pull request #32807: [SPARK-35669][SQL] Fix special char in CSV header with filter pushdown

2021-06-14 Thread GitBox
cloud-fan commented on pull request #32807: URL: https://github.com/apache/spark/pull/32807#issuecomment-859756215 ping @maropu -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dbtsai closed pull request #32777: [SPARK-35640][SQL] Refactor Parquet vectorized reader to remove duplicated code paths

2021-06-14 Thread GitBox
dbtsai closed pull request #32777: URL: https://github.com/apache/spark/pull/32777 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please

[GitHub] [spark] dongjoon-hyun commented on pull request #32874: [SPARK-35699][Kubernetes] Improve error message when creating k8s pod failed.

2021-06-14 Thread GitBox
dongjoon-hyun commented on pull request #32874: URL: https://github.com/apache/spark/pull/32874#issuecomment-859238036 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For

[GitHub] [spark] mridulm commented on pull request #32754: [SPARK-35613][CORE][SQL] Cache commonly occurring strings in SQLMetrics, JSONProtocol and AccumulatorV2 classes

2021-06-14 Thread GitBox
mridulm commented on pull request #32754: URL: https://github.com/apache/spark/pull/32754#issuecomment-859237050 Thanks for the details @venkata91. Looks good to me, will leave it open for a couple of days for review by others as well. -- This is an automated message from the Apache

[GitHub] [spark] AmplabJenkins commented on pull request #32610: [SPARK-35460][K8S] verify the content of`spark.kubernetes.executor.podNamePrefix` before post it to k8s api-server

2021-06-14 Thread GitBox
AmplabJenkins commented on pull request #32610: URL: https://github.com/apache/spark/pull/32610#issuecomment-859222879 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32693: [SPARK-35556][SQL][TESTS] Avoid log NoSuchMethodError when running multiple Hive version related tests

2021-06-14 Thread GitBox
AmplabJenkins removed a comment on pull request #32693: URL: https://github.com/apache/spark/pull/32693#issuecomment-859318241 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dongjoon-hyun commented on pull request #32879: [SPARK-35694][INFRA][FollowUp] Increase the default JVM stack size of SBT/Maven

2021-06-14 Thread GitBox
dongjoon-hyun commented on pull request #32879: URL: https://github.com/apache/spark/pull/32879#issuecomment-859642808 +1, LGTM. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] AmplabJenkins commented on pull request #32881: [SPARK-33298][CORE] Decouple file naming from FileCommitProtocol

2021-06-14 Thread GitBox
AmplabJenkins commented on pull request #32881: URL: https://github.com/apache/spark/pull/32881#issuecomment-859538479 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For

[GitHub] [spark] SparkQA commented on pull request #31677: [SPARK-34565][SQL] Collapse Window nodes with Project between them

2021-06-14 Thread GitBox
SparkQA commented on pull request #31677: URL: https://github.com/apache/spark/pull/31677#issuecomment-859238226 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For

[GitHub] [spark] xinrong-databricks commented on pull request #32859: [SPARK-35708][PYTHON][TEST] Add BaseTest for DataTypeOps

2021-06-14 Thread GitBox
xinrong-databricks commented on pull request #32859: URL: https://github.com/apache/spark/pull/32859#issuecomment-859715361 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] cloud-fan commented on pull request #32470: [SPARK-35712][SQL] Simplify ResolveAggregateFunctions

2021-06-14 Thread GitBox
cloud-fan commented on pull request #32470: URL: https://github.com/apache/spark/pull/32470#issuecomment-859763809 It's a bit hacky to restore an `UnresolvedAttribute` from an `AttributeReference`, I introduced `TempResolvedColumn` and updated the PR description accordingly. Please take

[GitHub] [spark] sunchao opened a new pull request #32887: [SPARK-35321][SQL] Don't register Hive permanent functions when creating Hive client

2021-06-14 Thread GitBox
sunchao opened a new pull request #32887: URL: https://github.com/apache/spark/pull/32887 ### What changes were proposed in this pull request? Instantiate a new Hive client through `Hive.getWithoutRegisterFns(conf, false)` instead of `Hive.get(conf)`, if `Hive` version

[GitHub] [spark] yaooqinn commented on a change in pull request #32610: [SPARK-35460][K8S] verify the content of`spark.kubernetes.executor.podNamePrefix` before post it to k8s api-server

2021-06-14 Thread GitBox
yaooqinn commented on a change in pull request #32610: URL: https://github.com/apache/spark/pull/32610#discussion_r649659243 ## File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtils.scala ## @@ -29,13 +29,16 @@

[GitHub] [spark] yaooqinn commented on pull request #32693: [SPARK-35556][SQL][TESTS] Avoid log NoSuchMethodError when running multiple Hive version related tests

2021-06-14 Thread GitBox
yaooqinn commented on pull request #32693: URL: https://github.com/apache/spark/pull/32693#issuecomment-859200528 +1 for Yuming's opinion -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] wangyum commented on a change in pull request #32410: [SPARK-35286][SQL] Replace SessionState.start with SessionState.setCurrentSessionState

2021-06-14 Thread GitBox
wangyum commented on a change in pull request #32410: URL: https://github.com/apache/spark/pull/32410#discussion_r650064530 ## File path: sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/session/HiveSessionImpl.java ## @@ -141,7 +141,8 @@ public void open(Map

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32878: [WIP][SPARK-35719][SQL] Support casting of timestamp to timestamp without time zone type

2021-06-14 Thread GitBox
AmplabJenkins removed a comment on pull request #32878: URL: https://github.com/apache/spark/pull/32878#issuecomment-859959704 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] ueshin commented on pull request #32871: [SPARK-35475][PYTHON] Fix disallow_untyped_defs mypy checks.

2021-06-14 Thread GitBox
ueshin commented on pull request #32871: URL: https://github.com/apache/spark/pull/32871#issuecomment-859211559 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32828: [SPARK-35689][SS] Add log warn when keyWithIndexToValue returns null value

2021-06-14 Thread GitBox
AmplabJenkins removed a comment on pull request #32828: URL: https://github.com/apache/spark/pull/32828#issuecomment-859333938 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] gengliangwang commented on a change in pull request #32878: [SPARK-35719][SQL] Support casting of timestamp to timestamp without time zone type

2021-06-14 Thread GitBox
gengliangwang commented on a change in pull request #32878: URL: https://github.com/apache/spark/pull/32878#discussion_r649842166 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala ## @@ -513,6 +514,8 @@ abstract class CastBase

[GitHub] [spark] maropu commented on a change in pull request #32858: [SPARK-35706][SQL]Consider making the ':' in STRUCT data type definition optional

2021-06-14 Thread GitBox
maropu commented on a change in pull request #32858: URL: https://github.com/apache/spark/pull/32858#discussion_r649659422 ## File path: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ## @@ -930,7 +930,7 @@ complexColTypeList ;

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32875: [WIP][SPARK-35703] Remove HashClusteredDistribution and relax constraint for bucket join

2021-06-14 Thread GitBox
AmplabJenkins removed a comment on pull request #32875: URL: https://github.com/apache/spark/pull/32875#issuecomment-859358106 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44209/

[GitHub] [spark] maropu commented on pull request #32870: [SPARK-35439][SQL][FOLLOWUP] ExpressionContainmentOrdering should not sort unrelated expressions

2021-06-14 Thread GitBox
maropu commented on pull request #32870: URL: https://github.com/apache/spark/pull/32870#issuecomment-859339503 Thank you. Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dongjoon-hyun commented on pull request #32885: [SPARK-35742][SQL] Expression.semanticEquals should be symmetrical

2021-06-14 Thread GitBox
dongjoon-hyun commented on pull request #32885: URL: https://github.com/apache/spark/pull/32885#issuecomment-859785610 BTW, could you make CI happy? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] SparkQA removed a comment on pull request #32873: [SPARK-35718][SQL] Support casting of Date to timestamp without time zone type

2021-06-14 Thread GitBox
SparkQA removed a comment on pull request #32873: URL: https://github.com/apache/spark/pull/32873#issuecomment-859237310 **[Test build #139672 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139672/testReport)** for PR 32873 at commit

[GitHub] [spark] dongjoon-hyun closed pull request #32717: [SPARK-35396][SQL][TESTS][FOLLOWUP] Add a UT to check if a user-defined cachedBatch is completely released

2021-06-14 Thread GitBox
dongjoon-hyun closed pull request #32717: URL: https://github.com/apache/spark/pull/32717 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service,

[GitHub] [spark] AmplabJenkins commented on pull request #32868: [SPARK-35714][CORE] Bug fix for deadlock during the executor shutdown

2021-06-14 Thread GitBox
AmplabJenkins commented on pull request #32868: URL: https://github.com/apache/spark/pull/32868#issuecomment-859474336 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32061: [WIP][SPARK-32833][SQL] JDBC V2 Datasource aggregate push down

2021-06-14 Thread GitBox
AmplabJenkins removed a comment on pull request #32061: URL: https://github.com/apache/spark/pull/32061#issuecomment-852709012 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43714/

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32873: [SPARK-35718][SQL] Support casting of Date to timestamp without time zone type

2021-06-14 Thread GitBox
AmplabJenkins removed a comment on pull request #32873: URL: https://github.com/apache/spark/pull/32873#issuecomment-859239133 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] beliefer commented on pull request #32852: [SPARK-35283][SQL] Support query some DDL with CTES

2021-06-14 Thread GitBox
beliefer commented on pull request #32852: URL: https://github.com/apache/spark/pull/32852#issuecomment-859212585 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32849: [SPARK-35704][SQL] Add fields to `DayTimeIntervalType`

2021-06-14 Thread GitBox
AmplabJenkins removed a comment on pull request #32849: URL: https://github.com/apache/spark/pull/32849#issuecomment-859925040 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44231/

[GitHub] [spark] AmplabJenkins commented on pull request #32821: [SPARK-35342][PYTHON] Introduce DecimalOps and make `isnull` method data-type-based

2021-06-14 Thread GitBox
AmplabJenkins commented on pull request #32821: URL: https://github.com/apache/spark/pull/32821#issuecomment-859284671 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For

[GitHub] [spark] SparkQA commented on pull request #32867: [SPARK-XXXX][PYTHON] path level discover for python unittests

2021-06-14 Thread GitBox
SparkQA commented on pull request #32867: URL: https://github.com/apache/spark/pull/32867#issuecomment-859217722 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For

[GitHub] [spark] xinrong-databricks commented on a change in pull request #32871: [SPARK-35475][PYTHON] Fix disallow_untyped_defs mypy checks.

2021-06-14 Thread GitBox
xinrong-databricks commented on a change in pull request #32871: URL: https://github.com/apache/spark/pull/32871#discussion_r649659322 ## File path: python/pyspark/pandas/namespace.py ## @@ -2480,6 +2531,7 @@ def isna(obj): isnull = isna +@no_type_check Review comment:

[GitHub] [spark] dongjoon-hyun closed pull request #32750: [SPARK-34512][BUILD][SQL] Upgrade built-in Hive to 2.3.9

2021-06-14 Thread GitBox
dongjoon-hyun closed pull request #32750: URL: https://github.com/apache/spark/pull/32750 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service,

[GitHub] [spark] Yikun commented on a change in pull request #32821: [SPARK-35342][PYTHON] Introduce DecimalOps

2021-06-14 Thread GitBox
Yikun commented on a change in pull request #32821: URL: https://github.com/apache/spark/pull/32821#discussion_r649663634 ## File path: python/pyspark/pandas/tests/data_type_ops/test_decimal_ops.py ## @@ -0,0 +1,65 @@ +# +# Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32822: [SPARK-35678][ML] add a common softmax function

2021-06-14 Thread GitBox
AmplabJenkins removed a comment on pull request #32822: URL: https://github.com/apache/spark/pull/32822#issuecomment-859236794 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HeartSaVioR commented on pull request #32828: [SPARK-35689][SS] Add log warn when keyWithIndexToValue returns null value

2021-06-14 Thread GitBox
HeartSaVioR commented on pull request #32828: URL: https://github.com/apache/spark/pull/32828#issuecomment-859280232 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For

[GitHub] [spark] dongjoon-hyun commented on pull request #32750: [SPARK-34512][BUILD][SQL] Upgrade built-in Hive to 2.3.9

2021-06-14 Thread GitBox
dongjoon-hyun commented on pull request #32750: URL: https://github.com/apache/spark/pull/32750#issuecomment-859240913 Merged to master for Apache Spark 3.2.0. Thank you so much, @wangyum and @sunchao ! -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] cloud-fan commented on a change in pull request #32863: [SPARK-35652][SQL] joinWith on two table generated from same one

2021-06-14 Thread GitBox
cloud-fan commented on a change in pull request #32863: URL: https://github.com/apache/spark/pull/32863#discussion_r649707269 ## File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ## @@ -1161,6 +1161,28 @@ class Dataset[T] private[sql]( throw new

[GitHub] [spark] c21 opened a new pull request #32881: [SPARK-33298][CORE] Decouple file naming from FileCommitProtocol

2021-06-14 Thread GitBox
c21 opened a new pull request #32881: URL: https://github.com/apache/spark/pull/32881 ### What changes were proposed in this pull request? This PR is to propose to decouple file naming functionality from `FileCommitProtocol`. Currently `FileCommitProtocol` mainly does two

[GitHub] [spark] SparkQA removed a comment on pull request #32821: [SPARK-35342][PYTHON] Introduce DecimalOps and make `isnull` method data-type-based

2021-06-14 Thread GitBox
SparkQA removed a comment on pull request #32821: URL: https://github.com/apache/spark/pull/32821#issuecomment-859239324 **[Test build #139678 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139678/testReport)** for PR 32821 at commit

[GitHub] [spark] SparkQA commented on pull request #32693: [SPARK-35556][SQL][TESTS] Avoid log NoSuchMethodError when running multiple Hive version related tests

2021-06-14 Thread GitBox
SparkQA commented on pull request #32693: URL: https://github.com/apache/spark/pull/32693#issuecomment-859311620 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For

[GitHub] [spark] maropu commented on a change in pull request #32659: [SPARK-22639][SQL] Support aggregate cbo stats estimation if the group by clause involves substring

2021-06-14 Thread GitBox
maropu commented on a change in pull request #32659: URL: https://github.com/apache/spark/pull/32659#discussion_r649650245 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala ## @@ -80,6 +80,54 @@ object

[GitHub] [spark] LuciferYang edited a comment on pull request #32693: [SPARK-35556][SQL][TESTS] Avoid log NoSuchMethodError when running multiple Hive version related tests

2021-06-14 Thread GitBox
LuciferYang edited a comment on pull request #32693: URL: https://github.com/apache/spark/pull/32693#issuecomment-859364872 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] SparkQA commented on pull request #32873: [SPARK-35718][SQL] Support casting of Date to timestamp without time zone type

2021-06-14 Thread GitBox
SparkQA commented on pull request #32873: URL: https://github.com/apache/spark/pull/32873#issuecomment-859237310 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For

[GitHub] [spark] SparkQA removed a comment on pull request #32835: [SPARK-35591][PYTHON][DOCS] Rename "Koalas" to "pandas API on Spark" in the documents

2021-06-14 Thread GitBox
SparkQA removed a comment on pull request #32835: URL: https://github.com/apache/spark/pull/32835#issuecomment-859241995 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] viirya commented on a change in pull request #32885: [SPARK-35742][SQL] Expression.semanticEquals should be symmetrical

2021-06-14 Thread GitBox
viirya commented on a change in pull request #32885: URL: https://github.com/apache/spark/pull/32885#discussion_r650152680 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala ## @@ -278,11 +278,6 @@ case class

[GitHub] [spark] xinrong-databricks commented on pull request #32871: [SPARK-35475][PYTHON] Fix disallow_untyped_defs mypy checks.

2021-06-14 Thread GitBox
xinrong-databricks commented on pull request #32871: URL: https://github.com/apache/spark/pull/32871#issuecomment-859231008 LGTM, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dongjoon-hyun commented on pull request #32870: [SPARK-35439][SQL][FOLLOWUP] ExpressionContainmentOrdering should not sort unrelated expressions

2021-06-14 Thread GitBox
dongjoon-hyun commented on pull request #32870: URL: https://github.com/apache/spark/pull/32870#issuecomment-859221852 Could you rebase to the master branch? The linter failure was fixed on the master branch. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] venkata91 commented on pull request #32754: [SPARK-35613][CORE][SQL] Cache commonly occurring strings in SQLMetrics, JSONProtocol and AccumulatorV2 classes

2021-06-14 Thread GitBox
venkata91 commented on pull request #32754: URL: https://github.com/apache/spark/pull/32754#issuecomment-859237856 Sounds good. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] SparkQA commented on pull request #32868: [SPARK-35714][CORE] Bug fix for deadlock during the executor shutdown

2021-06-14 Thread GitBox
SparkQA commented on pull request #32868: URL: https://github.com/apache/spark/pull/32868#issuecomment-859924495 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For

[GitHub] [spark] yaooqinn commented on pull request #32837: [SPARK-35692][K8S] Use AtomicInteger for executor id generating

2021-06-14 Thread GitBox
yaooqinn commented on pull request #32837: URL: https://github.com/apache/spark/pull/32837#issuecomment-859198377 thanks @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] srowen commented on pull request #32868: [SPARK-35714][CORE] Bug fix for deadlock during the executor shutdown

2021-06-14 Thread GitBox
srowen commented on pull request #32868: URL: https://github.com/apache/spark/pull/32868#issuecomment-859690220 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries

[GitHub] [spark] Yikun commented on pull request #32877: wait until something does get queued.

2021-06-14 Thread GitBox
Yikun commented on pull request #32877: URL: https://github.com/apache/spark/pull/32877#issuecomment-859332533 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries

[GitHub] [spark] AmplabJenkins commented on pull request #32878: [WIP][SPARK-35719][SQL] Support casting of timestamp to timestamp without time zone type

2021-06-14 Thread GitBox
AmplabJenkins commented on pull request #32878: URL: https://github.com/apache/spark/pull/32878#issuecomment-859959704 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For

[GitHub] [spark] cloud-fan opened a new pull request #32885: [SPARK-35742][SQL] Expression.semanticEquals should be symmetrical

2021-06-14 Thread GitBox
cloud-fan opened a new pull request #32885: URL: https://github.com/apache/spark/pull/32885 ### What changes were proposed in this pull request? Currently, there are some expressions that overwrite `semanticEquals`, which makes it not symmetrical. Ideally, expressions should

[GitHub] [spark] xinrong-databricks commented on a change in pull request #32821: [SPARK-35342][PYTHON] Introduce DecimalOps and make `isnull` method data-type-based

2021-06-14 Thread GitBox
xinrong-databricks commented on a change in pull request #32821: URL: https://github.com/apache/spark/pull/32821#discussion_r650119142 ## File path: python/pyspark/pandas/data_type_ops/base.py ## @@ -206,3 +209,10 @@ def restore(self, col: pd.Series) -> pd.Series: def

[GitHub] [spark] HyukjinKwon commented on pull request #32835: [SPARK-35591][PYTHON][DOCS] Rename "Koalas" to "pandas API on Spark" in the documents

2021-06-14 Thread GitBox
HyukjinKwon commented on pull request #32835: URL: https://github.com/apache/spark/pull/32835#issuecomment-859522057 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] AmplabJenkins commented on pull request #32872: SPARK-35639 make hasCoalescedPartition return true if something was a…

2021-06-14 Thread GitBox
AmplabJenkins commented on pull request #32872: URL: https://github.com/apache/spark/pull/32872#issuecomment-859223115 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] SparkQA removed a comment on pull request #32875: [WIP][SPARK-35703] Remove HashClusteredDistribution and relax constraint for bucket join

2021-06-14 Thread GitBox
SparkQA removed a comment on pull request #32875: URL: https://github.com/apache/spark/pull/32875#issuecomment-859241708 **[Test build #139680 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139680/testReport)** for PR 32875 at commit

[GitHub] [spark] sarutak commented on a change in pull request #32845: [SPARK-35691][CORE] addFile/addJar/addDirectory should put CanonicalFile

2021-06-14 Thread GitBox
sarutak commented on a change in pull request #32845: URL: https://github.com/apache/spark/pull/32845#discussion_r649860579 ## File path: core/src/test/scala/org/apache/spark/SparkContextSuite.scala ## @@ -1285,6 +1285,30 @@ class SparkContextSuite extends SparkFunSuite with

[GitHub] [spark] AmplabJenkins commented on pull request #32865: [SPARK-35701][SQL] Use copy-on-write semantics for SQLConf registered configurations.

2021-06-14 Thread GitBox
AmplabJenkins commented on pull request #32865: URL: https://github.com/apache/spark/pull/32865#issuecomment-859503461 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For

[GitHub] [spark] cloud-fan commented on pull request #32863: [SPARK-35652][SQL] joinWith on two table generated from same one

2021-06-14 Thread GitBox
cloud-fan commented on pull request #32863: URL: https://github.com/apache/spark/pull/32863#issuecomment-859551282 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For

[GitHub] [spark] cloud-fan closed pull request #32863: [SPARK-35652][SQL] joinWith on two table generated from same one

2021-06-14 Thread GitBox
cloud-fan closed pull request #32863: URL: https://github.com/apache/spark/pull/32863 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service,

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32871: [SPARK-35475][PYTHON] Fix disallow_untyped_defs mypy checks.

2021-06-14 Thread GitBox
dongjoon-hyun commented on a change in pull request #32871: URL: https://github.com/apache/spark/pull/32871#discussion_r649665382 ## File path: python/pyspark/pandas/utils.py ## @@ -600,7 +600,7 @@ def column_labels_level(column_labels: List[Tuple]) -> int: return

[GitHub] [spark] ekoifman commented on pull request #32872: SPARK-35639 make hasCoalescedPartition return true if something was a…

2021-06-14 Thread GitBox
ekoifman commented on pull request #32872: URL: https://github.com/apache/spark/pull/32872#issuecomment-859223950 FYI @cloud-fan @ulysses-you as requested in https://github.com/apache/spark/pull/32776#discussion_r648852993 -- This is an automated message from the Apache Git

[GitHub] [spark] viirya commented on pull request #32870: [SPARK-35439][SQL][FOLLOWUP] ExpressionContainmentOrdering should not sort unrelated expressions

2021-06-14 Thread GitBox
viirya commented on pull request #32870: URL: https://github.com/apache/spark/pull/32870#issuecomment-859223043 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries

[GitHub] [spark] yikf opened a new pull request #32876: [SPARK-35722][CORE] wait until something does get queued

2021-06-14 Thread GitBox
yikf opened a new pull request #32876: URL: https://github.com/apache/spark/pull/32876 ### What changes were proposed in this pull request? Currently, we continue the loop after wait timeout when `ContextCleaner` cleanUp if the queue is empty, It is an ineffective loop because the queue

[GitHub] [spark] steveloughran commented on pull request #30135: [SPARK-29250][BUILD] Upgrade to Hadoop 3.3.1

2021-06-14 Thread GitBox
steveloughran commented on pull request #30135: URL: https://github.com/apache/spark/pull/30135#issuecomment-859463610 > For the regression, don't know the full context behind the original change but seems like a good thing to do, although a boolean flag returned might be less disruptive

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31980: [SPARK-34807][SQL] Transpose Window nodes with Project between them

2021-06-14 Thread GitBox
AmplabJenkins removed a comment on pull request #31980: URL: https://github.com/apache/spark/pull/31980#issuecomment-859600283 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44215/

[GitHub] [spark] SparkQA commented on pull request #32552: [SPARK-34819][SQL] MapType supports comparable semantics

2021-06-14 Thread GitBox
SparkQA commented on pull request #32552: URL: https://github.com/apache/spark/pull/32552#issuecomment-859550039 **[Test build #139700 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139700/testReport)** for PR 32552 at commit

[GitHub] [spark] yikf commented on pull request #32876: [SPARK-35722][CORE] wait until something does get queued

2021-06-14 Thread GitBox
yikf commented on pull request #32876: URL: https://github.com/apache/spark/pull/32876#issuecomment-859291267 Code conflict, reopen in -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] SparkQA removed a comment on pull request #32874: [SPARK-35699][K8S] Improve error message when creating k8s pod failed.

2021-06-14 Thread GitBox
SparkQA removed a comment on pull request #32874: URL: https://github.com/apache/spark/pull/32874#issuecomment-859239174 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] ulysses-you opened a new pull request #32883: [SPARK-35725][SQL] Support repartition expand partitions in AQE

2021-06-14 Thread GitBox
ulysses-you opened a new pull request #32883: URL: https://github.com/apache/spark/pull/32883 ### What changes were proposed in this pull request? * Add a new rule `ExpandShufflePartitions` in AQE `queryStageOptimizerRules` * Add a new config

[GitHub] [spark] pingsutw commented on pull request #32874: [SPARK-35699][K8S] Improve error message when creating k8s pod failed.

2021-06-14 Thread GitBox
pingsutw commented on pull request #32874: URL: https://github.com/apache/spark/pull/32874#issuecomment-859453460 @dongjoon-hyun Thanks for your review. Sorry for some irrelevant changes. I use `dev/scalafmt` to format the code. Maybe there is something wrong in that script.

[GitHub] [spark] ueshin commented on a change in pull request #32847: [SPARK-35616][PYTHON] Make `astype` method data-type-based

2021-06-14 Thread GitBox
ueshin commented on a change in pull request #32847: URL: https://github.com/apache/spark/pull/32847#discussion_r650219350 ## File path: python/pyspark/pandas/data_type_ops/binary_ops.py ## @@ -53,3 +57,34 @@ def radd(self, left, right) -> Union["Series", "Index"]:

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32847: [SPARK-35616][PYTHON] Make `astype` method data-type-based

2021-06-14 Thread GitBox
AmplabJenkins removed a comment on pull request #32847: URL: https://github.com/apache/spark/pull/32847#issuecomment-859944403 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dbtsai commented on pull request #32777: [SPARK-35640][SQL] Refactor Parquet vectorized reader to remove duplicated code paths

2021-06-14 Thread GitBox
dbtsai commented on pull request #32777: URL: https://github.com/apache/spark/pull/32777#issuecomment-859282611 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries

[GitHub] [spark] MaxGekk commented on a change in pull request #32878: [SPARK-35719][SQL] Support casting of timestamp to timestamp without time zone type

2021-06-14 Thread GitBox
MaxGekk commented on a change in pull request #32878: URL: https://github.com/apache/spark/pull/32878#discussion_r649823568 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala ## @@ -513,6 +514,8 @@ abstract class CastBase extends

[GitHub] [spark] xkrogen commented on pull request #32810: [SPARK-35672][CORE][YARN] Pass user classpath entries to executors using config instead of command line.

2021-06-14 Thread GitBox
xkrogen commented on pull request #32810: URL: https://github.com/apache/spark/pull/32810#issuecomment-859941249 I realized that the existing logic in my PR, which was copied from the `ApplicationMaster`/driver, wouldn't properly handle `local` paths which used the `GATEWAY_ROOT_PATH` /

<    2   3   4   5   6   7   8   9   >