[GitHub] [spark] dgd-contributor commented on a change in pull request #32839: [SPARK-35679][SQL][WIP] instantToMicros overflow

2021-06-09 Thread GitBox
dgd-contributor commented on a change in pull request #32839: URL: https://github.com/apache/spark/pull/32839#discussion_r648088247 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala ## @@ -385,8 +385,10 @@ object DateTimeUtils {

[GitHub] [spark] cloud-fan commented on pull request #32786: [SPARK-35296][SQL] Allow Dataset.observe to work even if CollectMetricsExec in a task handles multiple partitions.

2021-06-09 Thread GitBox
cloud-fan commented on pull request #32786: URL: https://github.com/apache/spark/pull/32786#issuecomment-857500433 yea, let's just allow calling `AggregatingAccumulator.merge` on executor side -- This is an automated message from the Apache Git Service. To respond to the message, please l

[GitHub] [spark] SparkQA commented on pull request #32834: [SPARK-35690][SS] Stream-stream join keys should be reordered properly

2021-06-09 Thread GitBox
SparkQA commented on pull request #32834: URL: https://github.com/apache/spark/pull/32834#issuecomment-857500074 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44067/ -- This is an automated message from the Apache

[GitHub] [spark] wangyum commented on a change in pull request #32781: [SPARK-35650][SQL] Enhance `RepartitionByExpression` to make it coalesce partitions efficiently by AQE

2021-06-09 Thread GitBox
wangyum commented on a change in pull request #32781: URL: https://github.com/apache/spark/pull/32781#discussion_r648086922 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala ## @@ -86,11 +86,15 @@ case object ENSURE_REQUIRE

[GitHub] [spark] SparkQA commented on pull request #32303: [SPARK-34382][SQL] Support LATERAL subqueries

2021-06-09 Thread GitBox
SparkQA commented on pull request #32303: URL: https://github.com/apache/spark/pull/32303#issuecomment-857499023 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44068/ -- This

[GitHub] [spark] beliefer commented on pull request #32763: [SPARK-35058][SQL] Group exception messages in hive/client

2021-06-09 Thread GitBox
beliefer commented on pull request #32763: URL: https://github.com/apache/spark/pull/32763#issuecomment-857498892 @allisonwang-db @cloud-fan Thanks for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] JkSelf commented on a change in pull request #32781: [SPARK-35650][SQL] Enhance `RepartitionByExpression` to make it coalesce partitions efficiently by AQE

2021-06-09 Thread GitBox
JkSelf commented on a change in pull request #32781: URL: https://github.com/apache/spark/pull/32781#discussion_r648083641 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala ## @@ -86,11 +86,15 @@ case object ENSURE_REQUIREM

[GitHub] [spark] SparkQA commented on pull request #32764: [SPARK-35390][SQL] Handle type coercion when resolving V2 functions

2021-06-09 Thread GitBox
SparkQA commented on pull request #32764: URL: https://github.com/apache/spark/pull/32764#issuecomment-857496840 **[Test build #139561 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139561/testReport)** for PR 32764 at commit [`8d45492`](https://github.com

[GitHub] [spark] SparkQA commented on pull request #32786: [SPARK-35296][SQL] Allow Dataset.observe to work even if CollectMetricsExec in a task handles multiple partitions.

2021-06-09 Thread GitBox
SparkQA commented on pull request #32786: URL: https://github.com/apache/spark/pull/32786#issuecomment-857496811 **[Test build #139560 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139560/testReport)** for PR 32786 at commit [`748f2ef`](https://github.com

[GitHub] [spark] SparkQA commented on pull request #32838: [SPARK-35694][INFRA] Increase the default JVM stack size of SBT/Maven

2021-06-09 Thread GitBox
SparkQA commented on pull request #32838: URL: https://github.com/apache/spark/pull/32838#issuecomment-857496550 **[Test build #139558 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139558/testReport)** for PR 32838 at commit [`773c61a`](https://github.com

[GitHub] [spark] SparkQA commented on pull request #32836: [SPARK-35693][SS][TEST] Add plan check for stream-stream join unit test

2021-06-09 Thread GitBox
SparkQA commented on pull request #32836: URL: https://github.com/apache/spark/pull/32836#issuecomment-857496598 **[Test build #139559 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139559/testReport)** for PR 32836 at commit [`534459b`](https://github.com

[GitHub] [spark] JkSelf commented on a change in pull request #32781: [SPARK-35650][SQL] Enhance `RepartitionByExpression` to make it coalesce partitions efficiently by AQE

2021-06-09 Thread GitBox
JkSelf commented on a change in pull request #32781: URL: https://github.com/apache/spark/pull/32781#discussion_r648082801 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala ## @@ -1782,4 +1782,36 @@ class AdaptiveQueryEx

[GitHub] [spark] AmplabJenkins commented on pull request #32839: [SPARK-35679][SQL][WIP] instantToMicros overflow

2021-06-09 Thread GitBox
AmplabJenkins commented on pull request #32839: URL: https://github.com/apache/spark/pull/32839#issuecomment-857496298 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [spark] cloud-fan commented on a change in pull request #32807: [SPARK-35669][SQL] Fix special char in CSV header with filter pushdown

2021-06-09 Thread GitBox
cloud-fan commented on a change in pull request #32807: URL: https://github.com/apache/spark/pull/32807#discussion_r648082097 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala ## @@ -699,20 +699,25 @@ abstract class Pusha

[GitHub] [spark] sarutak commented on pull request #32786: [SPARK-35296][SQL] Allow Dataset.observe to work even if CollectMetricsExec in a task handles multiple partitions.

2021-06-09 Thread GitBox
sarutak commented on pull request #32786: URL: https://github.com/apache/spark/pull/32786#issuecomment-857495522 @cloud-fan @hvanhovell So, finally, we don't need to have a subclass and executors are allowed to call `merge` right? If so, I'll fix this issue by using `SQLConf.get` in `wit

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32835: [SPARK-35591][PYTHON][DOCS] Rename "Koalas" to "pandas API on Spark" in the documents

2021-06-09 Thread GitBox
AmplabJenkins removed a comment on pull request #32835: URL: https://github.com/apache/spark/pull/32835#issuecomment-857494781 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139550/ -

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32828: [SPARK-35689][SS] Add log warn when keyWithIndexToValue returns null value

2021-06-09 Thread GitBox
AmplabJenkins removed a comment on pull request #32828: URL: https://github.com/apache/spark/pull/32828#issuecomment-857494785 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44065/

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32800: [SPARK-35661][SQL] Allow deserialized off-heap memory entry

2021-06-09 Thread GitBox
AmplabJenkins removed a comment on pull request #32800: URL: https://github.com/apache/spark/pull/32800#issuecomment-857494776 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44060/

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32473: [SPARK-35345][SQL] Add Parquet tests to BloomFilterBenchmark

2021-06-09 Thread GitBox
AmplabJenkins removed a comment on pull request #32473: URL: https://github.com/apache/spark/pull/32473#issuecomment-857494773 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139527/ -

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32738: [SPARK-35474] Enable disallow_untyped_defs mypy check for pyspark.pandas.indexing.

2021-06-09 Thread GitBox
AmplabJenkins removed a comment on pull request #32738: URL: https://github.com/apache/spark/pull/32738#issuecomment-857494778 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44063/

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32830: [SPARK-32975][K8S][FOLLOWUP] Avoid None.get exception

2021-06-09 Thread GitBox
AmplabJenkins removed a comment on pull request #32830: URL: https://github.com/apache/spark/pull/32830#issuecomment-857467717 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32816: [SPARK-33832][SQL] Support optimize skewed join even if introduce extra shuffle

2021-06-09 Thread GitBox
AmplabJenkins removed a comment on pull request #32816: URL: https://github.com/apache/spark/pull/32816#issuecomment-857494775 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44066/

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32804: [SPARK-26867][YARN] Spark Support of YARN Placement Constraint

2021-06-09 Thread GitBox
AmplabJenkins removed a comment on pull request #32804: URL: https://github.com/apache/spark/pull/32804#issuecomment-857494783 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139553/ -

[GitHub] [spark] AmplabJenkins commented on pull request #32828: [SPARK-35689][SS] Add log warn when keyWithIndexToValue returns null value

2021-06-09 Thread GitBox
AmplabJenkins commented on pull request #32828: URL: https://github.com/apache/spark/pull/32828#issuecomment-857494785 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44065/ -- T

[GitHub] [spark] AmplabJenkins commented on pull request #32830: [SPARK-32975][K8S][FOLLOWUP] Avoid None.get exception

2021-06-09 Thread GitBox
AmplabJenkins commented on pull request #32830: URL: https://github.com/apache/spark/pull/32830#issuecomment-857494774 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139557/ -- This

[GitHub] [spark] AmplabJenkins commented on pull request #32816: [SPARK-33832][SQL] Support optimize skewed join even if introduce extra shuffle

2021-06-09 Thread GitBox
AmplabJenkins commented on pull request #32816: URL: https://github.com/apache/spark/pull/32816#issuecomment-857494775 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44066/ -- T

[GitHub] [spark] AmplabJenkins commented on pull request #32473: [SPARK-35345][SQL] Add Parquet tests to BloomFilterBenchmark

2021-06-09 Thread GitBox
AmplabJenkins commented on pull request #32473: URL: https://github.com/apache/spark/pull/32473#issuecomment-857494773 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139527/ -- This

[GitHub] [spark] AmplabJenkins commented on pull request #32738: [SPARK-35474] Enable disallow_untyped_defs mypy check for pyspark.pandas.indexing.

2021-06-09 Thread GitBox
AmplabJenkins commented on pull request #32738: URL: https://github.com/apache/spark/pull/32738#issuecomment-857494778 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44063/ -- T

[GitHub] [spark] AmplabJenkins commented on pull request #32800: [SPARK-35661][SQL] Allow deserialized off-heap memory entry

2021-06-09 Thread GitBox
AmplabJenkins commented on pull request #32800: URL: https://github.com/apache/spark/pull/32800#issuecomment-857494776 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44060/ -- T

[GitHub] [spark] AmplabJenkins commented on pull request #32835: [SPARK-35591][PYTHON][DOCS] Rename "Koalas" to "pandas API on Spark" in the documents

2021-06-09 Thread GitBox
AmplabJenkins commented on pull request #32835: URL: https://github.com/apache/spark/pull/32835#issuecomment-857494781 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139550/ -- This

[GitHub] [spark] cloud-fan commented on a change in pull request #32807: [SPARK-35669][SQL] Fix special char in CSV header with filter pushdown

2021-06-09 Thread GitBox
cloud-fan commented on a change in pull request #32807: URL: https://github.com/apache/spark/pull/32807#discussion_r648080461 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/sources/filters.scala ## @@ -35,8 +35,9 @@ sealed abstract class Filter { /** * L

[GitHub] [spark] cloud-fan closed pull request #32763: [SPARK-35058][SQL] Group exception messages in hive/client

2021-06-09 Thread GitBox
cloud-fan closed pull request #32763: URL: https://github.com/apache/spark/pull/32763 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, plea

[GitHub] [spark] cloud-fan commented on pull request #32763: [SPARK-35058][SQL] Group exception messages in hive/client

2021-06-09 Thread GitBox
cloud-fan commented on pull request #32763: URL: https://github.com/apache/spark/pull/32763#issuecomment-857493645 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] gengliangwang commented on a change in pull request #32839: [SPARK-35679][SQL][WIP] instantToMicros overflow

2021-06-09 Thread GitBox
gengliangwang commented on a change in pull request #32839: URL: https://github.com/apache/spark/pull/32839#discussion_r648076780 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala ## @@ -385,8 +385,10 @@ object DateTimeUtils {

[GitHub] [spark] SparkQA commented on pull request #32830: [SPARK-32975][K8S][FOLLOWUP] Avoid None.get exception

2021-06-09 Thread GitBox
SparkQA commented on pull request #32830: URL: https://github.com/apache/spark/pull/32830#issuecomment-857489845 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44070/ -- This is an automated message from the Apache

[GitHub] [spark] cloud-fan commented on a change in pull request #32781: [SPARK-35650][SQL] Enhance `RepartitionByExpression` to make it coalesce partitions efficiently by AQE

2021-06-09 Thread GitBox
cloud-fan commented on a change in pull request #32781: URL: https://github.com/apache/spark/pull/32781#discussion_r648074672 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala ## @@ -92,6 +92,11 @@ case object REPARTITION e

[GitHub] [spark] cloud-fan commented on a change in pull request #32786: [SPARK-35296][SQL] Allow Dataset.observe to work even if CollectMetricsExec in a task handles multiple partitions.

2021-06-09 Thread GitBox
cloud-fan commented on a change in pull request #32786: URL: https://github.com/apache/spark/pull/32786#discussion_r648073935 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/CollectMetricsExec.scala ## @@ -33,8 +35,65 @@ case class CollectMetricsExec(

[GitHub] [spark] cloud-fan commented on a change in pull request #32777: [SPARK-35640][SQL] Refactor Parquet vectorized reader to remove duplicated code paths

2021-06-09 Thread GitBox
cloud-fan commented on a change in pull request #32777: URL: https://github.com/apache/spark/pull/32777#discussion_r648071740 ## File path: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetVectorUpdaterFactory.java ## @@ -0,0 +1,980 @@ +/* + * L

[GitHub] [spark] cloud-fan commented on a change in pull request #32675: [SPARK-35531][SQL] Can not insert into hive bucket table if create table with upper case schema

2021-06-09 Thread GitBox
cloud-fan commented on a change in pull request #32675: URL: https://github.com/apache/spark/pull/32675#discussion_r648071228 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala ## @@ -1092,14 +1092,23 @@ private[hive] object HiveClientI

[GitHub] [spark] hvanhovell commented on pull request #32786: [SPARK-35296][SQL] Allow Dataset.observe to work even if CollectMetricsExec in a task handles multiple partitions.

2021-06-09 Thread GitBox
hvanhovell commented on pull request #32786: URL: https://github.com/apache/spark/pull/32786#issuecomment-857485838 @sarutak the `AggregatingAccumulator` is not used anywhere else. It is fine to make that merge method work on the executors are well. As for the SQLConf, we actually have a r

[GitHub] [spark] c21 commented on pull request #32836: [SPARK-35693][SS][TEST] Add plan check for stream-stream join unit test

2021-06-09 Thread GitBox
c21 commented on pull request #32836: URL: https://github.com/apache/spark/pull/32836#issuecomment-857484761 Rebased to latest master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

[GitHub] [spark] sarutak edited a comment on pull request #32786: [SPARK-35296][SQL] Allow Dataset.observe to work even if CollectMetricsExec in a task handles multiple partitions.

2021-06-09 Thread GitBox
sarutak edited a comment on pull request #32786: URL: https://github.com/apache/spark/pull/32786#issuecomment-857479722 I considered the following comments from @cloud-fan and @hvanhovell . https://github.com/apache/spark/pull/32786#discussion_r647073606 https://github.com/apache/s

[GitHub] [spark] Ngone51 commented on a change in pull request #32766: [SPARK-35627][CORE] Decommission executors in batches to not overload network bandwidth

2021-06-09 Thread GitBox
Ngone51 commented on a change in pull request #32766: URL: https://github.com/apache/spark/pull/32766#discussion_r648068021 ## File path: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala ## @@ -519,10 +558,7 @@ class CoarseGrainedSched

[GitHub] [spark] sarutak commented on a change in pull request #32786: [SPARK-35296][SQL] Allow Dataset.observe to work even if CollectMetricsExec in a task handles multiple partitions.

2021-06-09 Thread GitBox
sarutak commented on a change in pull request #32786: URL: https://github.com/apache/spark/pull/32786#discussion_r648063958 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/CollectMetricsExec.scala ## @@ -33,8 +35,65 @@ case class CollectMetricsExec( c

[GitHub] [spark] gengliangwang commented on a change in pull request #32838: [SPARK-35694][INFRA] Increase the default JVM stack size of SBT/Maven

2021-06-09 Thread GitBox
gengliangwang commented on a change in pull request #32838: URL: https://github.com/apache/spark/pull/32838#discussion_r648066926 ## File path: build/sbt ## @@ -53,7 +53,7 @@ realpath () { declare -r noshare_opts="-Dsbt.global.base=project/.sbtboot -Dsbt.boot.directory=projec

[GitHub] [spark] cloud-fan commented on pull request #32303: [SPARK-34382][SQL] Support LATERAL subqueries

2021-06-09 Thread GitBox
cloud-fan commented on pull request #32303: URL: https://github.com/apache/spark/pull/32303#issuecomment-857481346 retest this please -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] SparkQA removed a comment on pull request #32835: [SPARK-35591][PYTHON][DOCS] Rename "Koalas" to "pandas API on Spark" in the documents

2021-06-09 Thread GitBox
SparkQA removed a comment on pull request #32835: URL: https://github.com/apache/spark/pull/32835#issuecomment-857456784 **[Test build #139550 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139550/testReport)** for PR 32835 at commit [`4f4ace5`](https://gi

[GitHub] [spark] SparkQA commented on pull request #32835: [SPARK-35591][PYTHON][DOCS] Rename "Koalas" to "pandas API on Spark" in the documents

2021-06-09 Thread GitBox
SparkQA commented on pull request #32835: URL: https://github.com/apache/spark/pull/32835#issuecomment-857480547 **[Test build #139550 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139550/testReport)** for PR 32835 at commit [`4f4ace5`](https://github.co

[GitHub] [spark] SparkQA commented on pull request #32828: [SPARK-35689][SS] Add log warn when keyWithIndexToValue returns null value

2021-06-09 Thread GitBox
SparkQA commented on pull request #32828: URL: https://github.com/apache/spark/pull/32828#issuecomment-857480598 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44065/ -- This is an automated message from the A

[GitHub] [spark] sarutak commented on pull request #32786: [SPARK-35296][SQL] Allow Dataset.observe to work even if CollectMetricsExec in a task handles multiple partitions.

2021-06-09 Thread GitBox
sarutak commented on pull request #32786: URL: https://github.com/apache/spark/pull/32786#issuecomment-857479722 I considered the following comments from @cloud-fan and @hvanhovell . https://github.com/apache/spark/pull/32786#discussion_r647073606 https://github.com/apache/spark/pu

[GitHub] [spark] sarutak edited a comment on pull request #32786: [SPARK-35296][SQL] Allow Dataset.observe to work even if CollectMetricsExec in a task handles multiple partitions.

2021-06-09 Thread GitBox
sarutak edited a comment on pull request #32786: URL: https://github.com/apache/spark/pull/32786#issuecomment-857479722 I considered the following comments from @cloud-fan and @hvanhovell . https://github.com/apache/spark/pull/32786#discussion_r647073606 https://github.com/apache/s

[GitHub] [spark] sarutak commented on a change in pull request #32786: [SPARK-35296][SQL] Allow Dataset.observe to work even if CollectMetricsExec in a task handles multiple partitions.

2021-06-09 Thread GitBox
sarutak commented on a change in pull request #32786: URL: https://github.com/apache/spark/pull/32786#discussion_r648063958 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/CollectMetricsExec.scala ## @@ -33,8 +35,65 @@ case class CollectMetricsExec( c

[GitHub] [spark] gengliangwang commented on a change in pull request #32838: [SPARK-35694][INFRA] Increase the default JVM stack size of SBT/Maven

2021-06-09 Thread GitBox
gengliangwang commented on a change in pull request #32838: URL: https://github.com/apache/spark/pull/32838#discussion_r648062670 ## File path: build/sbt ## @@ -53,7 +53,7 @@ realpath () { declare -r noshare_opts="-Dsbt.global.base=project/.sbtboot -Dsbt.boot.directory=projec

[GitHub] [spark] wangyum commented on a change in pull request #32781: [SPARK-35650][SQL] Enhance `RepartitionByExpression` to make it coalesce partitions efficiently by AQE

2021-06-09 Thread GitBox
wangyum commented on a change in pull request #32781: URL: https://github.com/apache/spark/pull/32781#discussion_r648063132 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala ## @@ -712,7 +712,9 @@ abstract class SparkStrategies extends

[GitHub] [spark] SparkQA commented on pull request #32800: [SPARK-35661][SQL] Allow deserialized off-heap memory entry

2021-06-09 Thread GitBox
SparkQA commented on pull request #32800: URL: https://github.com/apache/spark/pull/32800#issuecomment-857478591 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44060/ -- This is an automated message from the A

[GitHub] [spark] gengliangwang commented on a change in pull request #32838: [SPARK-35694][INFRA] Increase the default JVM stack size of SBT/Maven

2021-06-09 Thread GitBox
gengliangwang commented on a change in pull request #32838: URL: https://github.com/apache/spark/pull/32838#discussion_r648062670 ## File path: build/sbt ## @@ -53,7 +53,7 @@ realpath () { declare -r noshare_opts="-Dsbt.global.base=project/.sbtboot -Dsbt.boot.directory=projec

[GitHub] [spark] cloud-fan commented on a change in pull request #32786: [SPARK-35296][SQL] Allow Dataset.observe to work even if CollectMetricsExec in a task handles multiple partitions.

2021-06-09 Thread GitBox
cloud-fan commented on a change in pull request #32786: URL: https://github.com/apache/spark/pull/32786#discussion_r648062679 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/CollectMetricsExec.scala ## @@ -33,8 +35,65 @@ case class CollectMetricsExec(

[GitHub] [spark] yaooqinn edited a comment on pull request #32830: [SPARK-32975][K8S][FOLLOWUP] Avoid None.get exception

2021-06-09 Thread GitBox
yaooqinn edited a comment on pull request #32830: URL: https://github.com/apache/spark/pull/32830#issuecomment-857473012 > You are right. Then, shall we move the check from `val driverPod` to `start` method? `ExecutorPodsAllocator` creation and `start` invocation is different? moved

[GitHub] [spark] Ngone51 commented on pull request #32136: [SPARK-35022][CORE] Task Scheduling Plugin in Spark

2021-06-09 Thread GitBox
Ngone51 commented on pull request #32136: URL: https://github.com/apache/spark/pull/32136#issuecomment-857475592 @tgravescs what's your opinion on the `StateStoreTaskLocation` proposal? -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] SparkQA commented on pull request #32816: [SPARK-33832][SQL] Support optimize skewed join even if introduce extra shuffle

2021-06-09 Thread GitBox
SparkQA commented on pull request #32816: URL: https://github.com/apache/spark/pull/32816#issuecomment-857475508 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44066/ -- This is an automated message from the A

[GitHub] [spark] SparkQA removed a comment on pull request #32473: [SPARK-35345][SQL] Add Parquet tests to BloomFilterBenchmark

2021-06-09 Thread GitBox
SparkQA removed a comment on pull request #32473: URL: https://github.com/apache/spark/pull/32473#issuecomment-857342571 **[Test build #139527 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139527/testReport)** for PR 32473 at commit [`d6e320c`](https://gi

[GitHub] [spark] SparkQA removed a comment on pull request #32804: [SPARK-26867][YARN] Spark Support of YARN Placement Constraint

2021-06-09 Thread GitBox
SparkQA removed a comment on pull request #32804: URL: https://github.com/apache/spark/pull/32804#issuecomment-857456767 **[Test build #139553 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139553/testReport)** for PR 32804 at commit [`2cd9e69`](https://gi

[GitHub] [spark] Ngone51 commented on pull request #32136: [SPARK-35022][CORE] Task Scheduling Plugin in Spark

2021-06-09 Thread GitBox
Ngone51 commented on pull request #32136: URL: https://github.com/apache/spark/pull/32136#issuecomment-857475125 > Isn't it the mapping still executor id <-> statestore? Executor id could be changed due to executor lost. More robust mapping, e.g. for our use-case, might be PVC id <-> state

[GitHub] [spark] SparkQA commented on pull request #32804: [SPARK-26867][YARN] Spark Support of YARN Placement Constraint

2021-06-09 Thread GitBox
SparkQA commented on pull request #32804: URL: https://github.com/apache/spark/pull/32804#issuecomment-857474980 **[Test build #139553 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139553/testReport)** for PR 32804 at commit [`2cd9e69`](https://github.co

[GitHub] [spark] SparkQA commented on pull request #32738: [SPARK-35474] Enable disallow_untyped_defs mypy check for pyspark.pandas.indexing.

2021-06-09 Thread GitBox
SparkQA commented on pull request #32738: URL: https://github.com/apache/spark/pull/32738#issuecomment-857474049 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44063/ -- This is an automated message from the A

[GitHub] [spark] yaooqinn commented on pull request #32830: [SPARK-32975][K8S][FOLLOWUP] Avoid None.get exception

2021-06-09 Thread GitBox
yaooqinn commented on pull request #32830: URL: https://github.com/apache/spark/pull/32830#issuecomment-857473012 > You are right. Then, shall we move the check from `val driverPod` to `start` method? `ExecutorPodsAllocator` creation and `start` invocation is different? moved.

[GitHub] [spark] SparkQA commented on pull request #32473: [SPARK-35345][SQL] Add Parquet tests to BloomFilterBenchmark

2021-06-09 Thread GitBox
SparkQA commented on pull request #32473: URL: https://github.com/apache/spark/pull/32473#issuecomment-857472139 **[Test build #139527 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139527/testReport)** for PR 32473 at commit [`d6e320c`](https://github.co

[GitHub] [spark] SparkQA removed a comment on pull request #32830: [SPARK-32975][K8S][FOLLOWUP] Avoid None.get exception

2021-06-09 Thread GitBox
SparkQA removed a comment on pull request #32830: URL: https://github.com/apache/spark/pull/32830#issuecomment-857466952 **[Test build #139557 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139557/testReport)** for PR 32830 at commit [`18067df`](https://gi

[GitHub] [spark] gengliangwang commented on a change in pull request #32838: [SPARK-35694][INFRA] Increase the default JVM stack size of SBT

2021-06-09 Thread GitBox
gengliangwang commented on a change in pull request #32838: URL: https://github.com/apache/spark/pull/32838#discussion_r648055346 ## File path: build/sbt ## @@ -53,7 +53,7 @@ realpath () { declare -r noshare_opts="-Dsbt.global.base=project/.sbtboot -Dsbt.boot.directory=projec

[GitHub] [spark] SparkQA commented on pull request #32830: [SPARK-32975][K8S][FOLLOWUP] Avoid None.get exception

2021-06-09 Thread GitBox
SparkQA commented on pull request #32830: URL: https://github.com/apache/spark/pull/32830#issuecomment-857470890 **[Test build #139557 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139557/testReport)** for PR 32830 at commit [`18067df`](https://github.co

[GitHub] [spark] HyukjinKwon commented on pull request #32806: [SPARK-35668][INFRA] Use "concurrency" syntax on Github Actions workflow

2021-06-09 Thread GitBox
HyukjinKwon commented on pull request #32806: URL: https://github.com/apache/spark/pull/32806#issuecomment-857471002 This seems like causing all PR builds canceled for some reasons: ![Screen Shot 2021-06-09 at 4 48 38 PM](https://user-images.githubusercontent.com/6477701/121314748-a4

[GitHub] [spark] dgd-contributor opened a new pull request #32839: [SPARK-35679][SQL] instantToMicros overflow

2021-06-09 Thread GitBox
dgd-contributor opened a new pull request #32839: URL: https://github.com/apache/spark/pull/32839 ### Why are the changes needed? With Long.minValue cast to an instant, secs will be floored in function microsToInstant and cause overflow when multiply with Micros_per_second def mi

[GitHub] [spark] HyukjinKwon commented on a change in pull request #32838: [SPARK-35694][INFRA] Increase the default JVM stack size of SBT

2021-06-09 Thread GitBox
HyukjinKwon commented on a change in pull request #32838: URL: https://github.com/apache/spark/pull/32838#discussion_r648051287 ## File path: build/sbt ## @@ -53,7 +53,7 @@ realpath () { declare -r noshare_opts="-Dsbt.global.base=project/.sbtboot -Dsbt.boot.directory=project/

[GitHub] [spark] HyukjinKwon commented on a change in pull request #32838: [SPARK-35694][INFRA] Increase the default JVM stack size of SBT

2021-06-09 Thread GitBox
HyukjinKwon commented on a change in pull request #32838: URL: https://github.com/apache/spark/pull/32838#discussion_r648051719 ## File path: build/sbt ## @@ -53,7 +53,7 @@ realpath () { declare -r noshare_opts="-Dsbt.global.base=project/.sbtboot -Dsbt.boot.directory=project/

[GitHub] [spark] AmplabJenkins commented on pull request #32830: [SPARK-32975][K8S][FOLLOWUP] Avoid None.get exception

2021-06-09 Thread GitBox
AmplabJenkins commented on pull request #32830: URL: https://github.com/apache/spark/pull/32830#issuecomment-857467717 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139551/ -- This

[GitHub] [spark] SparkQA commented on pull request #32830: [SPARK-32975][K8S][FOLLOWUP] Avoid None.get exception

2021-06-09 Thread GitBox
SparkQA commented on pull request #32830: URL: https://github.com/apache/spark/pull/32830#issuecomment-857467418 **[Test build #139551 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139551/testReport)** for PR 32830 at commit [`5a04e81`](https://github.co

[GitHub] [spark] SparkQA removed a comment on pull request #32830: [SPARK-32975][K8S][FOLLOWUP] Avoid None.get exception

2021-06-09 Thread GitBox
SparkQA removed a comment on pull request #32830: URL: https://github.com/apache/spark/pull/32830#issuecomment-857456683 **[Test build #139551 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139551/testReport)** for PR 32830 at commit [`5a04e81`](https://gi

[GitHub] [spark] sunchao commented on a change in pull request #32764: [SPARK-35390][SQL] Handle type coercion when resolving V2 functions

2021-06-09 Thread GitBox
sunchao commented on a change in pull request #32764: URL: https://github.com/apache/spark/pull/32764#discussion_r648050362 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -2189,7 +2194,7 @@ class Analyzer(override val cata

[GitHub] [spark] SparkQA commented on pull request #32830: [SPARK-32975][K8S][FOLLOWUP] Avoid None.get exception

2021-06-09 Thread GitBox
SparkQA commented on pull request #32830: URL: https://github.com/apache/spark/pull/32830#issuecomment-857466952 **[Test build #139557 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139557/testReport)** for PR 32830 at commit [`18067df`](https://github.com

[GitHub] [spark] SparkQA removed a comment on pull request #32837: [SPARK-35692][K8S] Use AtomicInteger for executor id generating

2021-06-09 Thread GitBox
SparkQA removed a comment on pull request #32837: URL: https://github.com/apache/spark/pull/32837#issuecomment-857456615 **[Test build #139548 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139548/testReport)** for PR 32837 at commit [`e78d754`](https://gi

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32837: [SPARK-35692][K8S] Use AtomicInteger for executor id generating

2021-06-09 Thread GitBox
AmplabJenkins removed a comment on pull request #32837: URL: https://github.com/apache/spark/pull/32837#issuecomment-857464562 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139548/ -

[GitHub] [spark] AmplabJenkins commented on pull request #32837: [SPARK-35692][K8S] Use AtomicInteger for executor id generating

2021-06-09 Thread GitBox
AmplabJenkins commented on pull request #32837: URL: https://github.com/apache/spark/pull/32837#issuecomment-857464562 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139548/ -- This

[GitHub] [spark] SparkQA commented on pull request #32837: [SPARK-35692][K8S] Use AtomicInteger for executor id generating

2021-06-09 Thread GitBox
SparkQA commented on pull request #32837: URL: https://github.com/apache/spark/pull/32837#issuecomment-857464513 **[Test build #139548 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139548/testReport)** for PR 32837 at commit [`e78d754`](https://github.co

[GitHub] [spark] SparkQA commented on pull request #32838: [SPARK-35694][INFRA] Increase the default JVM stack size of SBT

2021-06-09 Thread GitBox
SparkQA commented on pull request #32838: URL: https://github.com/apache/spark/pull/32838#issuecomment-857464414 **[Test build #139556 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139556/testReport)** for PR 32838 at commit [`f997c91`](https://github.com

[GitHub] [spark] viirya commented on pull request #32828: [SPARK-35689][SS] Add log warn when keyWithIndexToValue returns null value

2021-06-09 Thread GitBox
viirya commented on pull request #32828: URL: https://github.com/apache/spark/pull/32828#issuecomment-857464204 I'll add some tests later. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [spark] viirya commented on pull request #32767: [SPARK-35628][SS] RocksDBFileManager - load checkpoint from DFS

2021-06-09 Thread GitBox
viirya commented on pull request #32767: URL: https://github.com/apache/spark/pull/32767#issuecomment-857463805 Thank you @xuanyuanking. I'll find some time to review this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] gengliangwang opened a new pull request #32838: [SPARK-35694][INFRA] Increase the default JVM stack size of SBT

2021-06-09 Thread GitBox
gengliangwang opened a new pull request #32838: URL: https://github.com/apache/spark/pull/32838 ### What changes were proposed in this pull request? The jenkins SBT keep failing with stack overflow error: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13

[GitHub] [spark] c21 edited a comment on pull request #32834: [SPARK-35690][SS] Stream-stream join keys should be reordered properly

2021-06-09 Thread GitBox
c21 edited a comment on pull request #32834: URL: https://github.com/apache/spark/pull/32834#issuecomment-857461894 @HeartSaVioR - got it. State store compatibility is valid concern, thanks for explanation. Closing this for now per above discussion. -- This is an automated message from t

[GitHub] [spark] c21 closed pull request #32834: [SPARK-35690][SS] Stream-stream join keys should be reordered properly

2021-06-09 Thread GitBox
c21 closed pull request #32834: URL: https://github.com/apache/spark/pull/32834 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please con

[GitHub] [spark] c21 commented on pull request #32834: [SPARK-35690][SS] Stream-stream join keys should be reordered properly

2021-06-09 Thread GitBox
c21 commented on pull request #32834: URL: https://github.com/apache/spark/pull/32834#issuecomment-857461894 @HeartSaVioR - got it. State store compatibility is valid concern, thanks for explanation. -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32303: [SPARK-34382][SQL] Support LATERAL subqueries

2021-06-09 Thread GitBox
AmplabJenkins removed a comment on pull request #32303: URL: https://github.com/apache/spark/pull/32303#issuecomment-857460918 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139528/ -

[GitHub] [spark] HeartSaVioR edited a comment on pull request #32834: [SPARK-35690][SS] Stream-stream join keys should be reordered properly

2021-06-09 Thread GitBox
HeartSaVioR edited a comment on pull request #32834: URL: https://github.com/apache/spark/pull/32834#issuecomment-857460454 No, I'm referring the output of reorderKey, which will be used to partition, and compare with key in state store. Both partitioning and comparison should be the same

[GitHub] [spark] AmplabJenkins commented on pull request #32303: [SPARK-34382][SQL] Support LATERAL subqueries

2021-06-09 Thread GitBox
AmplabJenkins commented on pull request #32303: URL: https://github.com/apache/spark/pull/32303#issuecomment-857460918 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139528/ -- This

[GitHub] [spark] HeartSaVioR commented on pull request #32834: [SPARK-35690][SS] Stream-stream join keys should be reordered properly

2021-06-09 Thread GitBox
HeartSaVioR commented on pull request #32834: URL: https://github.com/apache/spark/pull/32834#issuecomment-857460454 My point is all about state store compatibility. If there's any chance for the output of reorder to be changed due to various ways like different Spark versions, table metad

[GitHub] [spark] SparkQA removed a comment on pull request #32303: [SPARK-34382][SQL] Support LATERAL subqueries

2021-06-09 Thread GitBox
SparkQA removed a comment on pull request #32303: URL: https://github.com/apache/spark/pull/32303#issuecomment-857343892 **[Test build #139528 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139528/testReport)** for PR 32303 at commit [`8f19ab7`](https://gi

[GitHub] [spark] SparkQA commented on pull request #32303: [SPARK-34382][SQL] Support LATERAL subqueries

2021-06-09 Thread GitBox
SparkQA commented on pull request #32303: URL: https://github.com/apache/spark/pull/32303#issuecomment-857459416 **[Test build #139528 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139528/testReport)** for PR 32303 at commit [`8f19ab7`](https://github.co

[GitHub] [spark] c21 commented on pull request #32836: [SPARK-35693][SS][TEST] Add plan check for stream-stream join unit test

2021-06-09 Thread GitBox
c21 commented on pull request #32836: URL: https://github.com/apache/spark/pull/32836#issuecomment-857457557 Close & Reopen PR to trigger unit test on github action again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

[GitHub] [spark] c21 closed pull request #32836: [SPARK-35693][SS][TEST] Add plan check for stream-stream join unit test

2021-06-09 Thread GitBox
c21 closed pull request #32836: URL: https://github.com/apache/spark/pull/32836 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please con

[GitHub] [spark] SparkQA commented on pull request #32767: [SPARK-35628][SS] RocksDBFileManager - load checkpoint from DFS

2021-06-09 Thread GitBox
SparkQA commented on pull request #32767: URL: https://github.com/apache/spark/pull/32767#issuecomment-857456948 **[Test build #139555 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139555/testReport)** for PR 32767 at commit [`10d11b3`](https://github.com

[GitHub] [spark] SparkQA commented on pull request #32815: [SPARK-35675][SQL] EnsureRequirements remove shuffle should respect PartitioningCollection

2021-06-09 Thread GitBox
SparkQA commented on pull request #32815: URL: https://github.com/apache/spark/pull/32815#issuecomment-857456869 **[Test build #139552 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139552/testReport)** for PR 32815 at commit [`7a906a5`](https://github.com

<    5   6   7   8   9   10   11   >