[GitHub] [spark] c21 commented on pull request #32881: [SPARK-33298][CORE] Decouple file naming from FileCommitProtocol

2021-06-14 Thread GitBox
c21 commented on pull request #32881: URL: https://github.com/apache/spark/pull/32881#issuecomment-859364748 cc @cloud-fan could you help take a look when you have time? Will craft more unit tests if we have consensus on overall design, thanks. -- This is an automated message from the

[GitHub] [spark] AmplabJenkins commented on pull request #32875: [WIP][SPARK-35703] Remove HashClusteredDistribution and relax constraint for bucket join

2021-06-14 Thread GitBox
AmplabJenkins commented on pull request #32875: URL: https://github.com/apache/spark/pull/32875#issuecomment-859358106 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44209/ -- T

[GitHub] [spark] SparkQA removed a comment on pull request #31677: [SPARK-34565][SQL] Collapse Window nodes with Project between them

2021-06-14 Thread GitBox
SparkQA removed a comment on pull request #31677: URL: https://github.com/apache/spark/pull/31677#issuecomment-859238226 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For

[GitHub] [spark] AmplabJenkins commented on pull request #31980: [SPARK-34807][SQL] Transpose Window nodes with Project between them

2021-06-14 Thread GitBox
AmplabJenkins commented on pull request #31980: URL: https://github.com/apache/spark/pull/31980#issuecomment-859600283 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44215/ -- T

[GitHub] [spark] mridulm commented on pull request #32730: [SPARK-35593][K8S][CORE] Support shuffle data recovery on the reused PVCs

2021-06-14 Thread GitBox
mridulm commented on pull request #32730: URL: https://github.com/apache/spark/pull/32730#issuecomment-859254260 Late LGTM, apologies for the delay @dongjoon-hyun Thanks for reviewing @viirya ! -- This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [spark] SparkQA removed a comment on pull request #32473: [SPARK-35345][SQL] Add Parquet tests to BloomFilterBenchmark

2021-06-14 Thread GitBox
SparkQA removed a comment on pull request #32473: URL: https://github.com/apache/spark/pull/32473#issuecomment-859321357 **[Test build #139689 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139689/testReport)** for PR 32473 at commit [`82e1e8e`](https://gi

[GitHub] [spark] SparkQA removed a comment on pull request #32822: [SPARK-35678][ML] add a common softmax function

2021-06-14 Thread GitBox
SparkQA removed a comment on pull request #32822: URL: https://github.com/apache/spark/pull/32822#issuecomment-859199867 **[Test build #139666 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139666/testReport)** for PR 32822 at commit [`ff59ef3`](https://gi

[GitHub] [spark] viirya commented on a change in pull request #32850: [SPARK-34920][CORE][SQL] Add error classes with SQLSTATE

2021-06-14 Thread GitBox
viirya commented on a change in pull request #32850: URL: https://github.com/apache/spark/pull/32850#discussion_r650282569 ## File path: core/src/main/resources/error/README.md ## @@ -0,0 +1,79 @@ +# Guidelines + +To throw a standardized exception, developers should use an erro

[GitHub] [spark] AmplabJenkins commented on pull request #32849: [SPARK-35704][SQL] Add fields to `DayTimeIntervalType`

2021-06-14 Thread GitBox
AmplabJenkins commented on pull request #32849: URL: https://github.com/apache/spark/pull/32849#issuecomment-859925040 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44231/ -- T

[GitHub] [spark] SparkQA commented on pull request #31980: [SPARK-34807][SQL] Transpose Window nodes with Project between them

2021-06-14 Thread GitBox
SparkQA commented on pull request #31980: URL: https://github.com/apache/spark/pull/31980#issuecomment-859321840 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries

[GitHub] [spark] AmplabJenkins commented on pull request #32885: [SPARK-35742][SQL] Expression.semanticEquals should be symmetrical

2021-06-14 Thread GitBox
AmplabJenkins commented on pull request #32885: URL: https://github.com/apache/spark/pull/32885#issuecomment-859927179 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For q

[GitHub] [spark] tgravescs commented on a change in pull request #32810: [SPARK-35672][CORE][YARN] Pass user classpath entries to executors using config instead of command line.

2021-06-14 Thread GitBox
tgravescs commented on a change in pull request #32810: URL: https://github.com/apache/spark/pull/32810#discussion_r650029785 ## File path: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ## @@ -1464,6 +1464,14 @@ private object Client extends L

[GitHub] [spark] SparkQA removed a comment on pull request #32470: [SPARK-35712][SQL] Simplify ResolveAggregateFunctions

2021-06-14 Thread GitBox
SparkQA removed a comment on pull request #32470: URL: https://github.com/apache/spark/pull/32470#issuecomment-859311626 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For

[GitHub] [spark] ekoifman commented on a change in pull request #32872: SPARK-35639 make hasCoalescedPartition return true if something was a…

2021-06-14 Thread GitBox
ekoifman commented on a change in pull request #32872: URL: https://github.com/apache/spark/pull/32872#discussion_r649665188 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala ## @@ -1567,7 +1567,7 @@ class AdaptiveQueryE

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32870: [SPARK-35439][SQL][FOLLOWUP] ExpressionContainmentOrdering should not sort unrelated expressions

2021-06-14 Thread GitBox
AmplabJenkins removed a comment on pull request #32870: URL: https://github.com/apache/spark/pull/32870#issuecomment-859195282 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[GitHub] [spark] SparkQA commented on pull request #32858: [SPARK-35706][SQL] Consider making the ':' in STRUCT data type definition optional

2021-06-14 Thread GitBox
SparkQA commented on pull request #32858: URL: https://github.com/apache/spark/pull/32858#issuecomment-859524331 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries

[GitHub] [spark] AmplabJenkins commented on pull request #32828: [SPARK-35689][SS] Add log warn when keyWithIndexToValue returns null value

2021-06-14 Thread GitBox
AmplabJenkins commented on pull request #32828: URL: https://github.com/apache/spark/pull/32828#issuecomment-859333938 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For q

[GitHub] [spark] AmplabJenkins commented on pull request #32877: wait until something does get queued.

2021-06-14 Thread GitBox
AmplabJenkins commented on pull request #32877: URL: https://github.com/apache/spark/pull/32877#issuecomment-859330438 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [spark] SparkQA commented on pull request #32862: [SPARK-35695][SQL] Collect observed metrics from cached and adaptive execution sub-trees

2021-06-14 Thread GitBox
SparkQA commented on pull request #32862: URL: https://github.com/apache/spark/pull/32862#issuecomment-859296937 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries

[GitHub] [spark] MaxGekk closed pull request #32849: [SPARK-35704][SQL] Add fields to `DayTimeIntervalType`

2021-06-14 Thread GitBox
MaxGekk closed pull request #32849: URL: https://github.com/apache/spark/pull/32849 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please

[GitHub] [spark] AmplabJenkins commented on pull request #31677: [SPARK-34565][SQL] Collapse Window nodes with Project between them

2021-06-14 Thread GitBox
AmplabJenkins commented on pull request #31677: URL: https://github.com/apache/spark/pull/31677#issuecomment-859239766 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For q

[GitHub] [spark] pingsutw commented on pull request #32845: [SPARK-35691][CORE] addFile/addJar/addDirectory should put CanonicalFile

2021-06-14 Thread GitBox
pingsutw commented on pull request #32845: URL: https://github.com/apache/spark/pull/32845#issuecomment-859469433 @sarutak Thanks for your review. I've updated the tests. The CI failure seems unrelated to this PR. -- This is an automated message from the Apache Git Service. To respond

[GitHub] [spark] SparkQA removed a comment on pull request #32828: [SPARK-35689][SS] Add log warn when keyWithIndexToValue returns null value

2021-06-14 Thread GitBox
SparkQA removed a comment on pull request #32828: URL: https://github.com/apache/spark/pull/32828#issuecomment-859324551 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For

[GitHub] [spark] otterc commented on a change in pull request #32140: [WIP][SPARK-32922][SHUFFLE][CORE] Adds support for executors to fetch local and remote merged shuffle data

2021-06-14 Thread GitBox
otterc commented on a change in pull request #32140: URL: https://github.com/apache/spark/pull/32140#discussion_r648845832 ## File path: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ## @@ -1124,4 +1392,298 @@ object ShuffleBlockFetcherIterator

[GitHub] [spark] SparkQA commented on pull request #32885: [SPARK-35742][SQL] Expression.semanticEquals should be symmetrical

2021-06-14 Thread GitBox
SparkQA commented on pull request #32885: URL: https://github.com/apache/spark/pull/32885#issuecomment-859924503 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries

[GitHub] [spark] SparkQA commented on pull request #32879: [SPARK-35694][INFRA][FollowUp] Increase the default JVM stack size of SBT/Maven

2021-06-14 Thread GitBox
SparkQA commented on pull request #32879: URL: https://github.com/apache/spark/pull/32879#issuecomment-859521680 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries

[GitHub] [spark] viirya commented on pull request #32136: [SPARK-35022][CORE] Task Scheduling Plugin in Spark

2021-06-14 Thread GitBox
viirya commented on pull request #32136: URL: https://github.com/apache/spark/pull/32136#issuecomment-859926068 @mridulm @tgravescs Yeah, I will update the doc. Thanks for the discussion! -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] SparkQA commented on pull request #32828: [SPARK-35689][SS] Add log warn when keyWithIndexToValue returns null value

2021-06-14 Thread GitBox
SparkQA commented on pull request #32828: URL: https://github.com/apache/spark/pull/32828#issuecomment-859324551 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries

[GitHub] [spark] AmplabJenkins commented on pull request #32887: [SPARK-35321][SQL] Don't register Hive permanent functions when creating Hive client

2021-06-14 Thread GitBox
AmplabJenkins commented on pull request #32887: URL: https://github.com/apache/spark/pull/32887#issuecomment-859961665 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139710/ -- This

[GitHub] [spark] cloud-fan commented on pull request #32885: [SPARK-35742][SQL] Expression.semanticEquals should be symmetrical

2021-06-14 Thread GitBox
cloud-fan commented on pull request #32885: URL: https://github.com/apache/spark/pull/32885#issuecomment-859719503 cc @viirya @maropu -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

[GitHub] [spark] SparkQA removed a comment on pull request #32879: [SPARK-35694][INFRA][FollowUp] Increase the default JVM stack size of SBT/Maven

2021-06-14 Thread GitBox
SparkQA removed a comment on pull request #32879: URL: https://github.com/apache/spark/pull/32879#issuecomment-859521680 **[Test build #139695 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139695/testReport)** for PR 32879 at commit [`42e8ad7`](https://gi

[GitHub] [spark] ueshin commented on pull request #32886: [SPARK-35478][PYTHON] Enable disallow_untyped_defs mypy check for pyspark.pandas.window.

2021-06-14 Thread GitBox
ueshin commented on pull request #32886: URL: https://github.com/apache/spark/pull/32886#issuecomment-859921852 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries

[GitHub] [spark] Ngone51 commented on pull request #32136: [SPARK-35022][CORE] Task Scheduling Plugin in Spark

2021-06-14 Thread GitBox
Ngone51 commented on pull request #32136: URL: https://github.com/apache/spark/pull/32136#issuecomment-859233951 sgtm. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For

[GitHub] [spark] LuciferYang edited a comment on pull request #32838: [SPARK-35694][INFRA] Increase the default JVM stack size of SBT/Maven

2021-06-14 Thread GitBox
LuciferYang edited a comment on pull request #32838: URL: https://github.com/apache/spark/pull/32838#issuecomment-859329225 @gengliangwang It seems that there will still be a `StackOverflowError ` [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139688/console](https:/

[GitHub] [spark] sarutak commented on pull request #32892: [SPARK-35737][SQL] Parse day-time interval literals to tightest types

2021-06-14 Thread GitBox
sarutak commented on pull request #32892: URL: https://github.com/apache/spark/pull/32892#issuecomment-860265038 > Please, open a sub-task in JIRA to don't forget about it if it is hard to do in this PR. Yes, I've noticed that we should support single field but single field is curre

[GitHub] [spark] dgd-contributor commented on pull request #32863: [SPARK-35652][SQL] joinWith on two table generated from same one

2021-06-14 Thread GitBox
dgd-contributor commented on pull request #32863: URL: https://github.com/apache/spark/pull/32863#issuecomment-860315143 > @dgd-contributor can you open a backport pr for 3.0? thanks! ok, OTW -- This is an automated message from the Apache Git Service. To respond to the message

[GitHub] [spark] MaxGekk commented on pull request #32892: [SPARK-35737][SQL] Parse day-time interval literals to tightest types

2021-06-14 Thread GitBox
MaxGekk commented on pull request #32892: URL: https://github.com/apache/spark/pull/32892#issuecomment-860261600 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries

[GitHub] [spark] HyukjinKwon commented on pull request #31677: [SPARK-34565][SQL] Collapse Window nodes with Project between them

2021-06-14 Thread GitBox
HyukjinKwon commented on pull request #31677: URL: https://github.com/apache/spark/pull/31677#issuecomment-860132059 retest this please -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[GitHub] [spark] SparkQA removed a comment on pull request #32883: [SPARK-35725][SQL] Support repartition expand partitions in AQE

2021-06-14 Thread GitBox
SparkQA removed a comment on pull request #32883: URL: https://github.com/apache/spark/pull/32883#issuecomment-859992110 **[Test build #139726 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139726/testReport)** for PR 32883 at commit [`1eef9fb`](https://gi

[GitHub] [spark] MaxGekk commented on pull request #32893: [SPARK-35736][SQL] Parse any day-time interval types in SQL

2021-06-14 Thread GitBox
MaxGekk commented on pull request #32893: URL: https://github.com/apache/spark/pull/32893#issuecomment-860270913 +1, LGTM. GA passed. Merging to master. https://user-images.githubusercontent.com/1580697/121821994-3ffac000-cca5-11eb-9ecb-d94d9a88258c.png";> Thank you, @sarutak . -- T

[GitHub] [spark] dongjoon-hyun commented on pull request #32826: [SPARK-35670][BUILD] Upgrade ZSTD-JNI to 1.5.0-1

2021-06-14 Thread GitBox
dongjoon-hyun commented on pull request #32826: URL: https://github.com/apache/spark/pull/32826#issuecomment-860062911 @dchristle . I'm a big supporter of ZStandard and have no doubt that we need to upgrade ZSTD-JNI in the future. Your PR will be a part of Apache Spark definitely. - ht

[GitHub] [spark] MaxGekk closed pull request #32891: [SPARK-35734][SQL] Format day-time intervals using type fields

2021-06-14 Thread GitBox
MaxGekk closed pull request #32891: URL: https://github.com/apache/spark/pull/32891 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please

[GitHub] [spark] AmplabJenkins commented on pull request #32897: [SPARK-35415][SQL] Change information to map type for SHOW TABLE EXTE… … 1b410aa …NDED command

2021-06-14 Thread GitBox
AmplabJenkins commented on pull request #32897: URL: https://github.com/apache/spark/pull/32897#issuecomment-860216425 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For q

[GitHub] [spark] SparkQA removed a comment on pull request #32891: [SPARK-35734][SQL] Format day-time intervals using type fields

2021-06-14 Thread GitBox
SparkQA removed a comment on pull request #32891: URL: https://github.com/apache/spark/pull/32891#issuecomment-860042714 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For

[GitHub] [spark] srowen closed pull request #32868: [SPARK-35714][CORE] Bug fix for deadlock during the executor shutdown

2021-06-14 Thread GitBox
srowen closed pull request #32868: URL: https://github.com/apache/spark/pull/32868 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32889: [MINOR][K8S] Print the driver pod name instead of Some(name) if absent

2021-06-14 Thread GitBox
AmplabJenkins removed a comment on pull request #32889: URL: https://github.com/apache/spark/pull/32889#issuecomment-859992657 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[GitHub] [spark] dgd-contributor opened a new pull request #32899: [SPARK-35652][SQL] joinWith on two table generated from same one

2021-06-14 Thread GitBox
dgd-contributor opened a new pull request #32899: URL: https://github.com/apache/spark/pull/32899 It seems like spark inner join is performing a cartesian join in self joining using `joinWith` To produce this issues: ``` val df = spark.range(0,3) df.joinWith(df, df("id") ===

[GitHub] [spark] github-actions[bot] closed pull request #31207: [SPARK-34136][PYTHON][SQL] Add support for complex literals in PySpark

2021-06-14 Thread GitBox
github-actions[bot] closed pull request #31207: URL: https://github.com/apache/spark/pull/31207 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this ser

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32884: [SPARK-35738][PYTHON] Support 'y' properly in DataFrame with non-numeric columns with plots

2021-06-14 Thread GitBox
AmplabJenkins removed a comment on pull request #32884: URL: https://github.com/apache/spark/pull/32884#issuecomment-859979386 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44235/

[GitHub] [spark] sarutak commented on a change in pull request #32892: [SPARK-35737][SQL] Parse day-time interval literals to tightest types

2021-06-14 Thread GitBox
sarutak commented on a change in pull request #32892: URL: https://github.com/apache/spark/pull/32892#discussion_r650572710 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -2357,9 +2357,14 @@ class AstBuilder extends SqlBas

[GitHub] [spark] github-actions[bot] closed pull request #31684: [SPARK-34571][SQL][CORE] Provide a more convenient way to deprecate/remove/alternate configs

2021-06-14 Thread GitBox
github-actions[bot] closed pull request #31684: URL: https://github.com/apache/spark/pull/31684 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this ser

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32468: [SPARK-35335][SQL] Coalesce shuffle partition as much as possible for REPARTITION_BY_NONE

2021-06-14 Thread GitBox
AmplabJenkins removed a comment on pull request #32468: URL: https://github.com/apache/spark/pull/32468#issuecomment-859992018 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[GitHub] [spark] HyukjinKwon commented on pull request #32892: [SPARK-35737][SQL] Parse day-time interval literals to tightest types

2021-06-14 Thread GitBox
HyukjinKwon commented on pull request #32892: URL: https://github.com/apache/spark/pull/32892#issuecomment-860131758 cc @MaxGekk @yaooqinn FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] viirya closed pull request #32890: [SPARK-35689][SS][3.0] Add log warn when keyWithIndexToValue returns null value

2021-06-14 Thread GitBox
viirya closed pull request #32890: URL: https://github.com/apache/spark/pull/32890 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please

[GitHub] [spark] SparkQA removed a comment on pull request #32468: [SPARK-35335][SQL] Coalesce shuffle partition as much as possible for REPARTITION_BY_NONE

2021-06-14 Thread GitBox
SparkQA removed a comment on pull request #32468: URL: https://github.com/apache/spark/pull/32468#issuecomment-859980120 **[Test build #139722 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139722/testReport)** for PR 32468 at commit [`0dd5f8a`](https://gi

[GitHub] [spark] HyukjinKwon commented on pull request #32893: [SPARK-35736][SQL] Parse any day-time interval types in SQL

2021-06-14 Thread GitBox
HyukjinKwon commented on pull request #32893: URL: https://github.com/apache/spark/pull/32893#issuecomment-860131750 cc @MaxGekk @yaooqinn FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HeartSaVioR closed pull request #32828: [SPARK-35689][SS] Add log warn when keyWithIndexToValue returns null value

2021-06-14 Thread GitBox
HeartSaVioR closed pull request #32828: URL: https://github.com/apache/spark/pull/32828 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, pl

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32810: [SPARK-35672][CORE][YARN] Pass user classpath entries to executors using config instead of command line.

2021-06-14 Thread GitBox
AmplabJenkins removed a comment on pull request #32810: URL: https://github.com/apache/spark/pull/32810#issuecomment-859973860 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[GitHub] [spark] vinodkc opened a new pull request #32894: [SPARK-35747][core] Avoid printing full Exception stack trace, if HBase service is not running in a secure cluster

2021-06-14 Thread GitBox
vinodkc opened a new pull request #32894: URL: https://github.com/apache/spark/pull/32894 ### What changes were proposed in this pull request? In a secure Yarn cluster where HBase service is down, even if the spark application is not using HBase, during the application submit

[GitHub] [spark] Kimahriman commented on pull request #32448: [SPARK-35290][SQL] Use StructType merging for unionByName with null filling

2021-06-14 Thread GitBox
Kimahriman commented on pull request #32448: URL: https://github.com/apache/spark/pull/32448#issuecomment-860328810 So @viirya's comment made me realize that `StructType.merge` isn't quite the right solution since it immediately fails on exact type mismatch and can't handle similar types l

[GitHub] [spark] srowen commented on pull request #32813: [SPARK-34591][MLLIB][WIP] Disable decision tree pruning

2021-06-14 Thread GitBox
srowen commented on pull request #32813: URL: https://github.com/apache/spark/pull/32813#issuecomment-860237524 If you'd like to update the PR per the first part of https://github.com/apache/spark/pull/32813#issuecomment-857201726 I'd merge it. If you're busy I could reproduce that change

[GitHub] [spark] HyukjinKwon commented on pull request #32896: [SPARK-35748][SS][SQL] Fix StreamingJoinHelper to be able to handle day-time interval

2021-06-14 Thread GitBox
HyukjinKwon commented on pull request #32896: URL: https://github.com/apache/spark/pull/32896#issuecomment-860305067 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For que

[GitHub] [spark] SparkQA commented on pull request #32891: [SPARK-35734][SQL] Format day-time intervals using type fields

2021-06-14 Thread GitBox
SparkQA commented on pull request #32891: URL: https://github.com/apache/spark/pull/32891#issuecomment-860042714 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries

[GitHub] [spark] wangyum commented on a change in pull request #28032: [SPARK-31264][SQL] Repartition by dynamic partition columns before insert partition table

2021-06-14 Thread GitBox
wangyum commented on a change in pull request #28032: URL: https://github.com/apache/spark/pull/28032#discussion_r650358297 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala ## @@ -221,6 +221,46 @@ object DataSourceAnalys

[GitHub] [spark] viirya commented on a change in pull request #32895: [SPARK-35658][DOCS] Document Parquet encryption feature in Spark SQL

2021-06-14 Thread GitBox
viirya commented on a change in pull request #32895: URL: https://github.com/apache/spark/pull/32895#discussion_r650580240 ## File path: docs/sql-data-sources-parquet.md ## @@ -252,6 +252,51 @@ REFRESH TABLE my_table; +## Columnar Encryption + + +Since Spark 3.2, columnar

[GitHub] [spark] shahidki31 commented on a change in pull request #32659: [SPARK-22639][SQL] Support aggregate cbo stats estimation if the group by clause involves substring

2021-06-14 Thread GitBox
shahidki31 commented on a change in pull request #32659: URL: https://github.com/apache/spark/pull/32659#discussion_r650328411 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala ## @@ -80,6 +80,54 @@ obje

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32890: [SPARK-35689][SS][3.0] Add log warn when keyWithIndexToValue returns null value

2021-06-14 Thread GitBox
AmplabJenkins removed a comment on pull request #32890: URL: https://github.com/apache/spark/pull/32890#issuecomment-859998314 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32309: [SPARK-35203][SQL] Improve Repartition statistics estimation

2021-06-14 Thread GitBox
AmplabJenkins removed a comment on pull request #32309: URL: https://github.com/apache/spark/pull/32309#issuecomment-860078818 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[GitHub] [spark] MaxGekk commented on a change in pull request #32896: [SPARK-35748][SS][SQL] Fix StreamingJoinHelper to be able to handle day-time interval

2021-06-14 Thread GitBox
MaxGekk commented on a change in pull request #32896: URL: https://github.com/apache/spark/pull/32896#discussion_r650696194 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/StreamingJoinHelper.scala ## @@ -266,6 +266,9 @@ object StreamingJoinHel

[GitHub] [spark] dongjoon-hyun edited a comment on pull request #32826: [SPARK-35670][BUILD] Upgrade ZSTD-JNI to 1.5.0-1

2021-06-14 Thread GitBox
dongjoon-hyun edited a comment on pull request #32826: URL: https://github.com/apache/spark/pull/32826#issuecomment-860062911 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] AmplabJenkins commented on pull request #32898: [SPARK-35720][SQL] support casting of String to timestamp without time zone type

2021-06-14 Thread GitBox
AmplabJenkins commented on pull request #32898: URL: https://github.com/apache/spark/pull/32898#issuecomment-860255916 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [spark] sarutak commented on a change in pull request #32893: [SPARK-35736][SQL] Parse any day-time interval types in SQL

2021-06-14 Thread GitBox
sarutak commented on a change in pull request #32893: URL: https://github.com/apache/spark/pull/32893#discussion_r650562825 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -2514,8 +2514,19 @@ class AstBuilder extends SqlBas

[GitHub] [spark] AmplabJenkins commented on pull request #32889: [MINOR][K8S] Print the driver pod name instead of Some(name) if absent

2021-06-14 Thread GitBox
AmplabJenkins commented on pull request #32889: URL: https://github.com/apache/spark/pull/32889#issuecomment-859992657 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For q

[GitHub] [spark] SparkQA commented on pull request #32892: [SPARK-35737][SQL] Parse day-time interval literals to tightest types

2021-06-14 Thread GitBox
SparkQA commented on pull request #32892: URL: https://github.com/apache/spark/pull/32892#issuecomment-860087912 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries

[GitHub] [spark] viirya commented on pull request #32890: [SPARK-35689][SS][3.0] Add log warn when keyWithIndexToValue returns null value

2021-06-14 Thread GitBox
viirya commented on pull request #32890: URL: https://github.com/apache/spark/pull/32890#issuecomment-860007751 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries

[GitHub] [spark] dongjoon-hyun commented on pull request #32889: [MINOR][K8S] Print the driver pod name instead of Some(name) if absent

2021-06-14 Thread GitBox
dongjoon-hyun commented on pull request #32889: URL: https://github.com/apache/spark/pull/32889#issuecomment-860244631 I resolved the conflicts and backported this to branch-3.1 too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

[GitHub] [spark] SparkQA commented on pull request #32468: [SPARK-35335][SQL] Coalesce shuffle partition as much as possible for REPARTITION_BY_NONE

2021-06-14 Thread GitBox
SparkQA commented on pull request #32468: URL: https://github.com/apache/spark/pull/32468#issuecomment-859980120 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries

[GitHub] [spark] gengliangwang commented on a change in pull request #32898: [SPARK-35720][SQL] Support casting of String to timestamp without time zone type

2021-06-14 Thread GitBox
gengliangwang commented on a change in pull request #32898: URL: https://github.com/apache/spark/pull/32898#discussion_r650691762 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala ## @@ -1295,8 +1295,14 @@ abstract class AnsiCa

[GitHub] [spark] AmplabJenkins commented on pull request #32891: [SPARK-35734][SQL] Format day-time intervals using type fields

2021-06-14 Thread GitBox
AmplabJenkins commented on pull request #32891: URL: https://github.com/apache/spark/pull/32891#issuecomment-860048932 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For q

[GitHub] [spark] SparkQA removed a comment on pull request #32868: [SPARK-35714][CORE] Bug fix for deadlock during the executor shutdown

2021-06-14 Thread GitBox
SparkQA removed a comment on pull request #32868: URL: https://github.com/apache/spark/pull/32868#issuecomment-859924495 **[Test build #139712 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139712/testReport)** for PR 32868 at commit [`3b38ae8`](https://gi

[GitHub] [spark] CBribiescas edited a comment on pull request #32813: [SPARK-34591][MLLIB][WIP] Disable decision tree pruning

2021-06-14 Thread GitBox
CBribiescas edited a comment on pull request #32813: URL: https://github.com/apache/spark/pull/32813#issuecomment-860372085 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] sumeetgajjar commented on pull request #32576: [SPARK-35429][CORE] Remove commons-httpclient due to EOL and CVEs

2021-06-14 Thread GitBox
sumeetgajjar commented on pull request #32576: URL: https://github.com/apache/spark/pull/32576#issuecomment-860254406 Thank you @sunchao for the update, Yes I can pick up this once again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] SparkQA commented on pull request #32448: [SPARK-35290][SQL] Use StructType merging for unionByName with null filling

2021-06-14 Thread GitBox
SparkQA commented on pull request #32448: URL: https://github.com/apache/spark/pull/32448#issuecomment-860056419 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries

[GitHub] [spark] MaxGekk commented on pull request #32891: [SPARK-35734][SQL] Format day-time intervals using type fields

2021-06-14 Thread GitBox
MaxGekk commented on pull request #32891: URL: https://github.com/apache/spark/pull/32891#issuecomment-860093227 +1, LGTM. Merging to master. Thank you, @sarutak . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

[GitHub] [spark] HyukjinKwon closed pull request #32884: [SPARK-35738][PYTHON] Support 'y' properly in DataFrame with non-numeric columns with plots

2021-06-14 Thread GitBox
HyukjinKwon closed pull request #32884: URL: https://github.com/apache/spark/pull/32884 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, pl

[GitHub] [spark] sunchao commented on a change in pull request #32895: [SPARK-35658][DOCS] Document Parquet encryption feature in Spark SQL

2021-06-14 Thread GitBox
sunchao commented on a change in pull request #32895: URL: https://github.com/apache/spark/pull/32895#discussion_r650587006 ## File path: docs/sql-data-sources-parquet.md ## @@ -432,4 +477,32 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession

[GitHub] [spark] viirya commented on pull request #32865: [SPARK-35701][SQL] Use copy-on-write semantics for SQLConf registered configurations.

2021-06-14 Thread GitBox
viirya commented on pull request #32865: URL: https://github.com/apache/spark/pull/32865#issuecomment-860079395 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries

[GitHub] [spark] SparkQA removed a comment on pull request #32810: [SPARK-35672][CORE][YARN] Pass user classpath entries to executors using config instead of command line.

2021-06-14 Thread GitBox
SparkQA removed a comment on pull request #32810: URL: https://github.com/apache/spark/pull/32810#issuecomment-859945339 **[Test build #139719 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139719/testReport)** for PR 32810 at commit [`f780111`](https://gi

[GitHub] [spark] viirya closed pull request #32865: [SPARK-35701][SQL] Use copy-on-write semantics for SQLConf registered configurations.

2021-06-14 Thread GitBox
viirya closed pull request #32865: URL: https://github.com/apache/spark/pull/32865 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please

[GitHub] [spark] Kimahriman edited a comment on pull request #32448: [SPARK-35290][SQL] Use StructType merging for unionByName with null filling

2021-06-14 Thread GitBox
Kimahriman edited a comment on pull request #32448: URL: https://github.com/apache/spark/pull/32448#issuecomment-860328810 So @viirya's comment made me realize that `StructType.merge` isn't quite the right solution since it immediately fails on exact type mismatch and can't handle similar

[GitHub] [spark] Kimahriman commented on a change in pull request #32448: [SPARK-35290][SQL] Use StructType merging for unionByName with null filling

2021-06-14 Thread GitBox
Kimahriman commented on a change in pull request #32448: URL: https://github.com/apache/spark/pull/32448#discussion_r650522493 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/types/StructTypeSuite.scala ## @@ -150,95 +150,36 @@ class StructTypeSuite extends Spar

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32896: [SPARK-35748][SS][SQL] Fix StreamingJoinHelper to be able to handle day-time interval

2021-06-14 Thread GitBox
AmplabJenkins removed a comment on pull request #32896: URL: https://github.com/apache/spark/pull/32896#issuecomment-860191096 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[GitHub] [spark] dgd-contributor commented on pull request #32899: [SPARK-35652][SQL] joinWith on two table generated from same one

2021-06-14 Thread GitBox
dgd-contributor commented on pull request #32899: URL: https://github.com/apache/spark/pull/32899#issuecomment-860337987 @cloud-fan this is backport pr for https://github.com/apache/spark/pull/32863. Can you review it -- This is an automated message from the Apache Git Service. To respon

[GitHub] [spark] SparkQA removed a comment on pull request #32894: [SPARK-35747][core] Avoid printing full Exception stack trace, if HBase service is not running in a secure cluster

2021-06-14 Thread GitBox
SparkQA removed a comment on pull request #32894: URL: https://github.com/apache/spark/pull/32894#issuecomment-860146674 **[Test build #139747 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139747/testReport)** for PR 32894 at commit [`2e00a54`](https://gi

[GitHub] [spark] MaxGekk closed pull request #32878: [SPARK-35719][SQL] Support type conversion between timestamp and timestamp without time zone type

2021-06-14 Thread GitBox
MaxGekk closed pull request #32878: URL: https://github.com/apache/spark/pull/32878 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please

[GitHub] [spark] EnricoMi commented on a change in pull request #31905: [SPARK-34806][SQL] Add Observation helper for Dataset.observe

2021-06-14 Thread GitBox
EnricoMi commented on a change in pull request #31905: URL: https://github.com/apache/spark/pull/31905#discussion_r650684696 ## File path: sql/core/src/main/scala/org/apache/spark/sql/Observation.scala ## @@ -0,0 +1,184 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] [spark] HyukjinKwon commented on a change in pull request #28032: [SPARK-31264][SQL] Repartition before writing data source tables/directories

2021-06-14 Thread GitBox
HyukjinKwon commented on a change in pull request #28032: URL: https://github.com/apache/spark/pull/28032#discussion_r650449381 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala ## @@ -221,6 +221,46 @@ object DataSourceAn

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32883: [SPARK-35725][SQL] Support repartition expand partitions in AQE

2021-06-14 Thread GitBox
AmplabJenkins removed a comment on pull request #32883: URL: https://github.com/apache/spark/pull/32883#issuecomment-859997756 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[GitHub] [spark] pingsutw commented on pull request #32886: [SPARK-35478][PYTHON] Enable disallow_untyped_defs mypy check for pyspark.pandas.window.

2021-06-14 Thread GitBox
pingsutw commented on pull request #32886: URL: https://github.com/apache/spark/pull/32886#issuecomment-860205878 @ueshin Thanks for the review. I've updated the PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] dongjoon-hyun commented on pull request #32868: [SPARK-35714][CORE] Bug fix for deadlock during the executor shutdown

2021-06-14 Thread GitBox
dongjoon-hyun commented on pull request #32868: URL: https://github.com/apache/spark/pull/32868#issuecomment-86024 I'm +1 for backporting, @srowen . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

<    1   2   3   4   5   6   7   8   9   >