[GitHub] [spark] steveloughran commented on pull request #33276: [SPARK-36067][BUILD][TEST][YARN] YarnClusterSuite fails due to NoClassDefFoundError unless hadoop-3.2 profile is activated explicitly

2021-07-12 Thread GitBox
steveloughran commented on pull request #33276: URL: https://github.com/apache/spark/pull/33276#issuecomment-878342690 bq. (bcprov-jdk15on-1.60.jar and bcprov-jdk15-140.jar). They seem to contain overlapping classes so I'm not sure if its a good thing to do. bouncy castle is

[GitHub] [spark] cloud-fan commented on a change in pull request #33212: [SPARK-35912][SQL] Fix nullability of `spark.read.json`

2021-07-12 Thread GitBox
cloud-fan commented on a change in pull request #33212: URL: https://github.com/apache/spark/pull/33212#discussion_r668068075 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala ## @@ -418,6 +426,19 @@ class JacksonParser( }

[GitHub] [spark] AmplabJenkins commented on pull request #30869: [SPARK-33865][SQL] When HiveDDL, we need check avro schema too

2021-07-12 Thread GitBox
AmplabJenkins commented on pull request #30869: URL: https://github.com/apache/spark/pull/30869#issuecomment-878127962 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] ulysses-you closed pull request #33294: [SPARK-36085][SQL] Make broadcast query stage executionContext isolation from AQE

2021-07-12 Thread GitBox
ulysses-you closed pull request #33294: URL: https://github.com/apache/spark/pull/33294 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] viirya opened a new pull request #33305: [SPARK-35829][SQL][FOLLOWUP] Use subExprCode to avoid duplicate call of addNewFunction

2021-07-12 Thread GitBox
viirya opened a new pull request #33305: URL: https://github.com/apache/spark/pull/33305 ### What changes were proposed in this pull request? A followup of #32980. We should use `subExprCode` to avoid duplicate call of `addNewFunction`. ### Why are the changes

[GitHub] [spark] HyukjinKwon commented on a change in pull request #33295: [SPARK-33679][SQL][DOCS][FOLLOWUP] Enable spark.sql.adaptive.enabled by default

2021-07-12 Thread GitBox
HyukjinKwon commented on a change in pull request #33295: URL: https://github.com/apache/spark/pull/33295#discussion_r667787214 ## File path: docs/sql-performance-tuning.md ## @@ -234,7 +234,7 @@ The "REPARTITION_BY_RANGE" hint must have column names and a partition number is

[GitHub] [spark] srowen commented on pull request #32455: [SPARK-35253][SQL][BUILD] Bump up the janino version to v3.1.4

2021-07-12 Thread GitBox
srowen commented on pull request #32455: URL: https://github.com/apache/spark/pull/32455#issuecomment-878212471 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] cloud-fan commented on a change in pull request #33258: [SPARK-36037][SQL] Support ANSI SQL LOCALTIMESTAMP datetime value function

2021-07-12 Thread GitBox
cloud-fan commented on a change in pull request #33258: URL: https://github.com/apache/spark/pull/33258#discussion_r668168549 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala ## @@ -200,6 +200,44 @@ case class Now()

[GitHub] [spark] dongjoon-hyun commented on pull request #32455: [SPARK-35253][SQL][BUILD] Bump up the janino version to v3.1.4

2021-07-12 Thread GitBox
dongjoon-hyun commented on pull request #32455: URL: https://github.com/apache/spark/pull/32455#issuecomment-878328858 +1 for reverting, too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] gengliangwang commented on pull request #32455: [SPARK-35253][SQL][BUILD] Bump up the janino version to v3.1.4

2021-07-12 Thread GitBox
gengliangwang commented on pull request #32455: URL: https://github.com/apache/spark/pull/32455#issuecomment-878330381 +1 for reverting. Thank you @maropu @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] MaxGekk commented on pull request #33299: [SPARK-36046][SQL] Support new functions make_timestamp_ntz and make_timestamp_ltz

2021-07-12 Thread GitBox
MaxGekk commented on pull request #33299: URL: https://github.com/apache/spark/pull/33299#issuecomment-878318804 @gengliangwang Could you look at: ``` [info] - Check schemas for expression examples *** FAILED *** (609 milliseconds) [info] 364 did not equal 366 Expected 364

[GitHub] [spark] SparkQA commented on pull request #33258: [SPARK-36037][SQL] Support ANSI SQL LOCALTIMESTAMP datetime value function

2021-07-12 Thread GitBox
SparkQA commented on pull request #33258: URL: https://github.com/apache/spark/pull/33258#issuecomment-878119735 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] HyukjinKwon commented on a change in pull request #33293: [SPARK-36076][SQL][3.1] ArrayIndexOutOfBounds in Cast string to times…

2021-07-12 Thread GitBox
HyukjinKwon commented on a change in pull request #33293: URL: https://github.com/apache/spark/pull/33293#discussion_r667793021 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala ## @@ -237,114 +237,114 @@ object DateTimeUtils {

[GitHub] [spark] linhongliu-db edited a comment on pull request #32959: [SPARK-35780][SQL] Support DATE/TIMESTAMP literals across the full range

2021-07-12 Thread GitBox
linhongliu-db edited a comment on pull request #32959: URL: https://github.com/apache/spark/pull/32959#issuecomment-878388362 cc @cloud-fan, comments are addressed and tests are passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] SparkQA commented on pull request #33297: [SPARK-36069] from_json's exception should contain field name, type and value

2021-07-12 Thread GitBox
SparkQA commented on pull request #33297: URL: https://github.com/apache/spark/pull/33297#issuecomment-878159067 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] cloud-fan commented on a change in pull request #33200: [SPARK-36006][SQL] Migrate ALTER TABLE ... ADD/REPLACE COLUMNS commands to use UnresolvedTable to resolve the identifier

2021-07-12 Thread GitBox
cloud-fan commented on a change in pull request #33200: URL: https://github.com/apache/spark/pull/33200#discussion_r667995696 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statements.scala ## @@ -229,22 +228,13 @@ case class

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33297: [SPARK-36069] from_json's exception should contain field name, type and value

2021-07-12 Thread GitBox
AmplabJenkins removed a comment on pull request #33297: URL: https://github.com/apache/spark/pull/33297#issuecomment-878041222 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] AmplabJenkins commented on pull request #33298: [SPARK-36087][SQL][WIP] An Impl of skew key detection and data inflation optimization

2021-07-12 Thread GitBox
AmplabJenkins commented on pull request #33298: URL: https://github.com/apache/spark/pull/33298#issuecomment-878197315 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] cloud-fan commented on pull request #33302: Revert "[SPARK-35253][SQL][BUILD] Bump up the janino version to v3.1.4"

2021-07-12 Thread GitBox
cloud-fan commented on pull request #33302: URL: https://github.com/apache/spark/pull/33302#issuecomment-878434201 cc @maropu -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33038: [SPARK-35861][SS] Introduce "prefix match scan" feature on state store

2021-07-12 Thread GitBox
AmplabJenkins removed a comment on pull request #33038: URL: https://github.com/apache/spark/pull/33038#issuecomment-878123953 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] AmplabJenkins commented on pull request #33301: Update SSLOptions.scala

2021-07-12 Thread GitBox
AmplabJenkins commented on pull request #33301: URL: https://github.com/apache/spark/pull/33301#issuecomment-878238185 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] cloud-fan commented on pull request #33304: [SQL][MINOR] EquivalentExpressions.commonChildrenToRecurse should skip CodegenFallback

2021-07-12 Thread GitBox
cloud-fan commented on pull request #33304: URL: https://github.com/apache/spark/pull/33304#issuecomment-878498708 This is so minor that I opened this backport PR only because someone asked for it. I think the mistake was there since this piece of code was added. I didn't look into how

[GitHub] [spark] MaxGekk commented on a change in pull request #33299: [SPARK-36046][SQL] Support new functions make_timestamp_ntz and make_timestamp_ltz

2021-07-12 Thread GitBox
MaxGekk commented on a change in pull request #33299: URL: https://github.com/apache/spark/pull/33299#discussion_r668081297 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala ## @@ -2284,6 +2284,128 @@ case class

[GitHub] [spark] cloud-fan edited a comment on pull request #33271: [SPARK-36056][SQL] Combine readBatch and readIntegers in VectorizedRleValuesReader

2021-07-12 Thread GitBox
cloud-fan edited a comment on pull request #33271: URL: https://github.com/apache/spark/pull/33271#issuecomment-878328280 thanks, merging to master/3.2 (we have a major refactor in parquet reader in 3.2, and it's better to keep master and 3.2 in sync for the parquet reader codebase, in

[GitHub] [spark] SparkQA commented on pull request #33116: [SPARK-35259][SHUFFLE] Rename ExternalBlockHandler Timer variables to remove incorrect millis suffix

2021-07-12 Thread GitBox
SparkQA commented on pull request #33116: URL: https://github.com/apache/spark/pull/33116#issuecomment-878535863 **[Test build #140933 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140933/testReport)** for PR 33116 at commit

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33299: [SPARK-36046][SQL] Support new functions make_timestamp_ntz and make_timestamp_ltz

2021-07-12 Thread GitBox
AmplabJenkins removed a comment on pull request #33299: URL: https://github.com/apache/spark/pull/33299#issuecomment-878197305 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] AngersZhuuuu commented on pull request #33296: [SPARK-34402][SQL] Group exception about data format schema

2021-07-12 Thread GitBox
AngersZh commented on pull request #33296: URL: https://github.com/apache/spark/pull/33296#issuecomment-878166370 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] linar-jether commented on a change in pull request #29719: [SPARK-32846][SQL][PYTHON] Support createDataFrame from an RDD of pd.DataFrames

2021-07-12 Thread GitBox
linar-jether commented on a change in pull request #29719: URL: https://github.com/apache/spark/pull/29719#discussion_r667841596 ## File path: python/pyspark/sql/pandas/conversion.py ## @@ -297,8 +297,11 @@ class SparkConversionMixin(object): """ Min-in for the

[GitHub] [spark] AmplabJenkins commented on pull request #33282: [SPARK-36074][SQL] Add error class for StructType.findNestedField

2021-07-12 Thread GitBox
AmplabJenkins commented on pull request #33282: URL: https://github.com/apache/spark/pull/33282#issuecomment-878495180 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] MaxGekk closed pull request #33299: [SPARK-36046][SQL] Support new functions make_timestamp_ntz and make_timestamp_ltz

2021-07-12 Thread GitBox
MaxGekk closed pull request #33299: URL: https://github.com/apache/spark/pull/33299 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] Yikun commented on a change in pull request #33270: [SPARK-35956][K8S] Support auto assigning labels to decommissioning pods

2021-07-12 Thread GitBox
Yikun commented on a change in pull request #33270: URL: https://github.com/apache/spark/pull/33270#discussion_r667992561 ## File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala ## @@ -284,6 +284,24 @@ private[spark] object

[GitHub] [spark] dgd-contributor commented on a change in pull request #33293: [SPARK-36076][SQL][3.1] ArrayIndexOutOfBounds in Cast string to times…

2021-07-12 Thread GitBox
dgd-contributor commented on a change in pull request #33293: URL: https://github.com/apache/spark/pull/33293#discussion_r667814463 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala ## @@ -237,114 +237,114 @@ object DateTimeUtils

[GitHub] [spark] cloud-fan commented on pull request #33287: [SPARK-36081][SPARK-36066][SQL] Fix the compatibility breaking issue related to cast and UTF8String

2021-07-12 Thread GitBox
cloud-fan commented on pull request #33287: URL: https://github.com/apache/spark/pull/33287#issuecomment-878335670 `\b` means backspace, which is a control character that moves the cursor one character back in the console but doesn't delete it. I don't think we should trim it as

[GitHub] [spark] AmplabJenkins commented on pull request #33038: [SPARK-35861][SS] Introduce "prefix match scan" feature on state store

2021-07-12 Thread GitBox
AmplabJenkins commented on pull request #33038: URL: https://github.com/apache/spark/pull/33038#issuecomment-878123953 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] yaooqinn commented on a change in pull request #32949: [SPARK-35749][SPARK-35773][SQL] Parse unit list interval literals as tightest year-month/day-time interval types

2021-07-12 Thread GitBox
yaooqinn commented on a change in pull request #32949: URL: https://github.com/apache/spark/pull/32949#discussion_r668044474 ## File path: sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/SparkMetadataOperationSuite.scala ## @@ -355,6 +356,7 @@

[GitHub] [spark] SparkQA removed a comment on pull request #33298: [SPARK-36087][SQL][WIP] An Impl of skew key detection and data inflation optimization

2021-07-12 Thread GitBox
SparkQA removed a comment on pull request #33298: URL: https://github.com/apache/spark/pull/33298#issuecomment-878125294 **[Test build #140910 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140910/testReport)** for PR 33298 at commit

[GitHub] [spark] SparkQA removed a comment on pull request #30869: [SPARK-33865][SQL] When HiveDDL, we need check avro schema too

2021-07-12 Thread GitBox
SparkQA removed a comment on pull request #30869: URL: https://github.com/apache/spark/pull/30869#issuecomment-878120799 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] srowen commented on pull request #33293: [SPARK-36076][SQL][3.1] ArrayIndexOutOfBounds in Cast string to times…

2021-07-12 Thread GitBox
srowen commented on pull request #33293: URL: https://github.com/apache/spark/pull/33293#issuecomment-878316576 I see, it looks like both of the new functions in master are completely within try-catch and return None in case of error. If that's accurate, I think this is OK. -- This is

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33282: [SPARK-36074][SQL] Add error class for StructType.findNestedField

2021-07-12 Thread GitBox
AmplabJenkins removed a comment on pull request #33282: URL: https://github.com/apache/spark/pull/33282#issuecomment-878495180 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan commented on a change in pull request #33282: [SPARK-36074][SQL] Add error class for StructType.findNestedField

2021-07-12 Thread GitBox
cloud-fan commented on a change in pull request #33282: URL: https://github.com/apache/spark/pull/33282#discussion_r668059581 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala ## @@ -1349,9 +1349,13 @@ private[spark] object

[GitHub] [spark] SparkQA removed a comment on pull request #33297: [SPARK-36069] from_json's exception should contain field name, type and value

2021-07-12 Thread GitBox
SparkQA removed a comment on pull request #33297: URL: https://github.com/apache/spark/pull/33297#issuecomment-878159067 **[Test build #140912 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140912/testReport)** for PR 33297 at commit

[GitHub] [spark] HyukjinKwon closed pull request #33295: [SPARK-33679][SQL][DOCS][FOLLOWUP] Enable spark.sql.adaptive.enabled by default

2021-07-12 Thread GitBox
HyukjinKwon closed pull request #33295: URL: https://github.com/apache/spark/pull/33295 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] srowen commented on a change in pull request #33301: Update SSLOptions.scala

2021-07-12 Thread GitBox
srowen commented on a change in pull request #33301: URL: https://github.com/apache/spark/pull/33301#discussion_r667915510 ## File path: core/src/main/scala/org/apache/spark/SSLOptions.scala ## @@ -78,6 +78,12 @@ private[spark] case class SSLOptions(

[GitHub] [spark] AmplabJenkins commented on pull request #33299: [SPARK-36046][SQL] Support new functions make_timestamp_ntz and make_timestamp_ltz

2021-07-12 Thread GitBox
AmplabJenkins commented on pull request #33299: URL: https://github.com/apache/spark/pull/33299#issuecomment-878197305 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] xkrogen commented on pull request #33116: [SPARK-35259][SHUFFLE] Rename ExternalBlockHandler Timer variables to remove incorrect millis suffix

2021-07-12 Thread GitBox
xkrogen commented on pull request #33116: URL: https://github.com/apache/spark/pull/33116#issuecomment-878521045 Just put up a new diff which implements a custom `Timer` subclass, `TimerWithMillisecondSnapshots`, which acts the same as a normal `Timer` and stores nanoseconds internally,

[GitHub] [spark] yeshengm commented on a change in pull request #24595: [SPARK-20774][SPARK-27036][SQL] Cancel the running broadcast execution on BroadcastTimeout

2021-07-12 Thread GitBox
yeshengm commented on a change in pull request #24595: URL: https://github.com/apache/spark/pull/24595#discussion_r667815796 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala ## @@ -67,68 +70,74 @@ case class

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33258: [SPARK-36037][SQL] Support ANSI SQL LOCALTIMESTAMP datetime value function

2021-07-12 Thread GitBox
AmplabJenkins removed a comment on pull request #33258: URL: https://github.com/apache/spark/pull/33258#issuecomment-878122608 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan commented on pull request #32455: [SPARK-35253][SQL][BUILD] Bump up the janino version to v3.1.4

2021-07-12 Thread GitBox
cloud-fan commented on pull request #32455: URL: https://github.com/apache/spark/pull/32455#issuecomment-878310301 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] venkata91 opened a new pull request #33303: [SPARK-32920][CORE][SHUFFLE][FOLLOW-UP] Fix to run push-based shuffle tests in DAGSchedulerSuite in ad-hoc manner

2021-07-12 Thread GitBox
venkata91 opened a new pull request #33303: URL: https://github.com/apache/spark/pull/33303 ### What changes were proposed in this pull request? Currently when the push-based shuffle tests are run in an ad-hoc manner through IDE, `spark.testing` is not set to true therefore

[GitHub] [spark] AmplabJenkins commented on pull request #33258: [SPARK-36037][SQL] Support ANSI SQL LOCALTIMESTAMP datetime value function

2021-07-12 Thread GitBox
AmplabJenkins commented on pull request #33258: URL: https://github.com/apache/spark/pull/33258#issuecomment-878122608 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] Ngone51 commented on a change in pull request #32401: [SPARK-35276][CORE] Calculate checksum for shuffle data and write as checksum file

2021-07-12 Thread GitBox
Ngone51 commented on a change in pull request #32401: URL: https://github.com/apache/spark/pull/32401#discussion_r668027402 ## File path: core/src/main/java/org/apache/spark/shuffle/checksum/ShuffleChecksumHelper.java ## @@ -0,0 +1,66 @@ +package

[GitHub] [spark] gengliangwang commented on pull request #32951: [SPARK-33603][SQL] Grouping exception messages in execution/command

2021-07-12 Thread GitBox
gengliangwang commented on pull request #32951: URL: https://github.com/apache/spark/pull/32951#issuecomment-878459584 Merging to master/3.2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] SparkQA removed a comment on pull request #33296: [SPARK-34402][SQL] Group exception about data format schema

2021-07-12 Thread GitBox
SparkQA removed a comment on pull request #33296: URL: https://github.com/apache/spark/pull/33296#issuecomment-878077970 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32959: [SPARK-35780][SQL] Support DATE/TIMESTAMP literals across the full range

2021-07-12 Thread GitBox
AmplabJenkins removed a comment on pull request #32959: URL: https://github.com/apache/spark/pull/32959#issuecomment-878197304 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] AmplabJenkins commented on pull request #32401: [SPARK-35276][CORE] Calculate checksum for shuffle data and write as checksum file

2021-07-12 Thread GitBox
AmplabJenkins commented on pull request #32401: URL: https://github.com/apache/spark/pull/32401#issuecomment-878382949 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] MaxGekk opened a new pull request #33300: [SPARK-36089][SQL][DOCS] Update the SQL migration guide about encoding auto-detection of CSV files

2021-07-12 Thread GitBox
MaxGekk opened a new pull request #33300: URL: https://github.com/apache/spark/pull/33300 ### What changes were proposed in this pull request? In the PR, I propose to update the SQL migration guide, in particular the section about the migration from Spark 2.4 to 3.0. New item informs

[GitHub] [spark] skhandrikagmail opened a new pull request #33301: Update SSLOptions.scala

2021-07-12 Thread GitBox
skhandrikagmail opened a new pull request #33301: URL: https://github.com/apache/spark/pull/33301 passing needClientAuth to sslContextFactory would help enable mTLS authentication for Jetty through x509 certificates. ### What changes were proposed in this pull request?

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33298: [SPARK-36087][SQL][WIP] An Impl of skew key detection and data inflation optimization

2021-07-12 Thread GitBox
AmplabJenkins removed a comment on pull request #33298: URL: https://github.com/apache/spark/pull/33298#issuecomment-878197315 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] SparkQA commented on pull request #33282: [SPARK-36074][SQL] Add error class for StructType.findNestedField

2021-07-12 Thread GitBox
SparkQA commented on pull request #33282: URL: https://github.com/apache/spark/pull/33282#issuecomment-878425935 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] cloud-fan commented on a change in pull request #32949: [SPARK-35749][SPARK-35773][SQL] Parse unit list interval literals as tightest year-month/day-time interval types

2021-07-12 Thread GitBox
cloud-fan commented on a change in pull request #32949: URL: https://github.com/apache/spark/pull/32949#discussion_r668040064 ## File path: sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2DataFrameSuite.scala ## @@ -171,20 +172,22 @@ class

[GitHub] [spark] viirya commented on a change in pull request #32980: [SPARK-35829][SQL] Clean up evaluates subexpressions and add more flexibility to evaluate particular subexpressoin

2021-07-12 Thread GitBox
viirya commented on a change in pull request #32980: URL: https://github.com/apache/spark/pull/32980#discussion_r668186910 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ## @@ -1174,10 +1249,12 @@ class

[GitHub] [spark] MaxGekk commented on a change in pull request #33297: [SPARK-36069] from_json's exception should contain field name, type and value

2021-07-12 Thread GitBox
MaxGekk commented on a change in pull request #33297: URL: https://github.com/apache/spark/pull/33297#discussion_r667964651 ## File path: sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala ## @@ -595,6 +595,31 @@ class JsonFunctionsSuite extends QueryTest

[GitHub] [spark] SparkQA commented on pull request #33305: [SPARK-35829][SQL][FOLLOWUP] Use subExprCode to avoid duplicate call of addNewFunction

2021-07-12 Thread GitBox
SparkQA commented on pull request #33305: URL: https://github.com/apache/spark/pull/33305#issuecomment-878535619 **[Test build #140932 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140932/testReport)** for PR 33305 at commit

[GitHub] [spark] cloud-fan closed pull request #33271: [SPARK-36056][SQL] Combine readBatch and readIntegers in VectorizedRleValuesReader

2021-07-12 Thread GitBox
cloud-fan closed pull request #33271: URL: https://github.com/apache/spark/pull/33271 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] HyukjinKwon commented on a change in pull request #33291: [SPARK-35561][SQL] Remove leading zeros from empty static number type partition

2021-07-12 Thread GitBox
HyukjinKwon commented on a change in pull request #33291: URL: https://github.com/apache/spark/pull/33291#discussion_r667784669 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala ## @@ -351,10 +351,24 @@ object

[GitHub] [spark] cloud-fan commented on a change in pull request #32049: [SPARK-34952][SQL] Aggregate (Min/Max/Count) push down for Parquet

2021-07-12 Thread GitBox
cloud-fan commented on a change in pull request #32049: URL: https://github.com/apache/spark/pull/32049#discussion_r668104749 ## File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/SupportsPushDownAggregates.java ## @@ -0,0 +1,44 @@ +/* + * Licensed to

[GitHub] [spark] HyukjinKwon closed pull request #33300: [SPARK-36089][SQL][DOCS] Update the SQL migration guide about encoding auto-detection of CSV files

2021-07-12 Thread GitBox
HyukjinKwon closed pull request #33300: URL: https://github.com/apache/spark/pull/33300 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] srowen commented on a change in pull request #32895: [SPARK-35658][DOCS] Document Parquet encryption feature in Spark SQL

2021-07-12 Thread GitBox
srowen commented on a change in pull request #32895: URL: https://github.com/apache/spark/pull/32895#discussion_r667918235 ## File path: docs/sql-data-sources-parquet.md ## @@ -252,6 +252,71 @@ REFRESH TABLE my_table; +## Columnar Encryption + + +Since Spark 3.2,

[GitHub] [spark] ulysses-you commented on pull request #33295: [SPARK-33679][SQL][DOCS][FOLLOWUP] Enable spark.sql.adaptive.enabled by default

2021-07-12 Thread GitBox
ulysses-you commented on pull request #33295: URL: https://github.com/apache/spark/pull/33295#issuecomment-878259476 thank you all -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] SparkQA commented on pull request #32959: [SPARK-35780][SQL] Support DATE/TIMESTAMP literals across the full range

2021-07-12 Thread GitBox
SparkQA commented on pull request #32959: URL: https://github.com/apache/spark/pull/32959#issuecomment-878159455 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] SparkQA removed a comment on pull request #33299: [SPARK-36046][SQL] Support new functions make_timestamp_ntz and make_timestamp_ltz

2021-07-12 Thread GitBox
SparkQA removed a comment on pull request #33299: URL: https://github.com/apache/spark/pull/33299#issuecomment-878158988 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] AmplabJenkins commented on pull request #33297: [SPARK-36069] from_json's exception should contain field name, type and value

2021-07-12 Thread GitBox
AmplabJenkins commented on pull request #33297: URL: https://github.com/apache/spark/pull/33297#issuecomment-878234289 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] SparkQA removed a comment on pull request #33038: [SPARK-35861][SS] Introduce "prefix match scan" feature on state store

2021-07-12 Thread GitBox
SparkQA removed a comment on pull request #33038: URL: https://github.com/apache/spark/pull/33038#issuecomment-878119923 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] AmplabJenkins commented on pull request #33296: [SPARK-34402][SQL] Group exception about data format schema

2021-07-12 Thread GitBox
AmplabJenkins commented on pull request #33296: URL: https://github.com/apache/spark/pull/33296#issuecomment-878130202 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] viirya commented on pull request #32455: [SPARK-35253][SQL][BUILD] Bump up the janino version to v3.1.4

2021-07-12 Thread GitBox
viirya commented on pull request #32455: URL: https://github.com/apache/spark/pull/32455#issuecomment-878403541 +1 for reverting it. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] linhongliu-db commented on pull request #32959: [SPARK-35780][SQL] Support DATE/TIMESTAMP literals across the full range

2021-07-12 Thread GitBox
linhongliu-db commented on pull request #32959: URL: https://github.com/apache/spark/pull/32959#issuecomment-878388362 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] otterc commented on pull request #33303: [SPARK-32920][CORE][SHUFFLE][FOLLOW-UP] Fix to run push-based shuffle tests in DAGSchedulerSuite in ad-hoc manner

2021-07-12 Thread GitBox
otterc commented on pull request #33303: URL: https://github.com/apache/spark/pull/33303#issuecomment-878472677 Thanks for the fix @venkata91. I tested the change as well and it works. Looks good to me. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] HyukjinKwon commented on a change in pull request #24595: [SPARK-20774][SPARK-27036][SQL] Cancel the running broadcast execution on BroadcastTimeout

2021-07-12 Thread GitBox
HyukjinKwon commented on a change in pull request #24595: URL: https://github.com/apache/spark/pull/24595#discussion_r667820004 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala ## @@ -67,68 +70,74 @@ case class

[GitHub] [spark] cloud-fan opened a new pull request #33302: Revert "[SPARK-35253][SQL][BUILD] Bump up the janino version to v3.1.4"

2021-07-12 Thread GitBox
cloud-fan opened a new pull request #33302: URL: https://github.com/apache/spark/pull/33302 ### What changes were proposed in this pull request? This PR reverts https://github.com/apache/spark/pull/32455 and its followup https://github.com/apache/spark/pull/32536 , because

[GitHub] [spark] viirya commented on pull request #33038: [SPARK-35861][SS] Introduce "prefix match scan" feature on state store

2021-07-12 Thread GitBox
viirya commented on pull request #33038: URL: https://github.com/apache/spark/pull/33038#issuecomment-878404420 Thanks. Merging to master/3.2. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] SparkQA commented on pull request #32401: [SPARK-35276][CORE] Calculate checksum for shuffle data and write as checksum file

2021-07-12 Thread GitBox
SparkQA commented on pull request #32401: URL: https://github.com/apache/spark/pull/32401#issuecomment-878382443 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] Peng-Lei commented on a change in pull request #33175: [SPARK-35973][SQL] DataSourceV2: Support SHOW CATALOGS

2021-07-12 Thread GitBox
Peng-Lei commented on a change in pull request #33175: URL: https://github.com/apache/spark/pull/33175#discussion_r667599786 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ShowCatalogsExec.scala ## @@ -0,0 +1,47 @@ +/* + * Licensed to the

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30869: [SPARK-33865][SQL] When HiveDDL, we need check avro schema too

2021-07-12 Thread GitBox
AmplabJenkins removed a comment on pull request #30869: URL: https://github.com/apache/spark/pull/30869#issuecomment-878127962 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] venkata91 commented on pull request #33303: [SPARK-32920][CORE][SHUFFLE][FOLLOW-UP] Fix to run push-based shuffle tests in DAGSchedulerSuite in ad-hoc manner

2021-07-12 Thread GitBox
venkata91 commented on pull request #33303: URL: https://github.com/apache/spark/pull/33303#issuecomment-878466794 cc @otterc @mridulm -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dgd-contributor commented on pull request #33293: [SPARK-36076][SQL][3.1] ArrayIndexOutOfBounds in Cast string to times…

2021-07-12 Thread GitBox
dgd-contributor commented on pull request #33293: URL: https://github.com/apache/spark/pull/33293#issuecomment-878310125 > Agree, let's emulate the change in master as closely as makes sense when just 'backporting' the fix part Yes, I've emulate the fix in master, or should I put

[GitHub] [spark] SparkQA removed a comment on pull request #33175: [SPARK-35973][SQL] DataSourceV2: Support SHOW CATALOGS

2021-07-12 Thread GitBox
SparkQA removed a comment on pull request #33175: URL: https://github.com/apache/spark/pull/33175#issuecomment-878043168 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] AmplabJenkins commented on pull request #32959: [SPARK-35780][SQL] Support DATE/TIMESTAMP literals across the full range

2021-07-12 Thread GitBox
AmplabJenkins commented on pull request #32959: URL: https://github.com/apache/spark/pull/32959#issuecomment-878197304 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32401: [SPARK-35276][CORE] Calculate checksum for shuffle data and write as checksum file

2021-07-12 Thread GitBox
AmplabJenkins removed a comment on pull request #32401: URL: https://github.com/apache/spark/pull/32401#issuecomment-878382949 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] viirya closed pull request #33038: [SPARK-35861][SS] Introduce "prefix match scan" feature on state store

2021-07-12 Thread GitBox
viirya closed pull request #33038: URL: https://github.com/apache/spark/pull/33038 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] cloud-fan commented on a change in pull request #33142: [SPARK-35940][SQL] Refactor EquivalentExpressions to make it more efficient

2021-07-12 Thread GitBox
cloud-fan commented on a change in pull request #33142: URL: https://github.com/apache/spark/pull/33142#discussion_r668130184 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala ## @@ -135,33 +125,47 @@ class

[GitHub] [spark] SparkQA commented on pull request #33302: Revert "[SPARK-35253][SQL][BUILD] Bump up the janino version to v3.1.4"

2021-07-12 Thread GitBox
SparkQA commented on pull request #33302: URL: https://github.com/apache/spark/pull/33302#issuecomment-878457891 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33175: [SPARK-35973][SQL] DataSourceV2: Support SHOW CATALOGS

2021-07-12 Thread GitBox
AmplabJenkins removed a comment on pull request #33175: URL: https://github.com/apache/spark/pull/33175#issuecomment-878234284 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] AmplabJenkins commented on pull request #32951: [SPARK-33603][SQL] Grouping execution/command

2021-07-12 Thread GitBox
AmplabJenkins commented on pull request #32951: URL: https://github.com/apache/spark/pull/32951#issuecomment-878197307 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140901/ -- This

[GitHub] [spark] SparkQA commented on pull request #33304: [SQL][MINOR] EquivalentExpressions.commonChildrenToRecurse should skip CodegenFallback

2021-07-12 Thread GitBox
SparkQA commented on pull request #33304: URL: https://github.com/apache/spark/pull/33304#issuecomment-878468300 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] gengliangwang closed pull request #32951: [SPARK-33603][SQL] Grouping exception messages in execution/command

2021-07-12 Thread GitBox
gengliangwang closed pull request #32951: URL: https://github.com/apache/spark/pull/32951 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] SparkQA commented on pull request #33298: [SPARK-36087][SQL][WIP] An Impl of skew key detection and data inflation optimization

2021-07-12 Thread GitBox
SparkQA commented on pull request #33298: URL: https://github.com/apache/spark/pull/33298#issuecomment-878125294 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] gengliangwang commented on a change in pull request #33258: [SPARK-36037][SQL] Support ANSI SQL LOCALTIMESTAMP datetime value function

2021-07-12 Thread GitBox
gengliangwang commented on a change in pull request #33258: URL: https://github.com/apache/spark/pull/33258#discussion_r667833154 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala ## @@ -236,6 +274,8 @@ case class

[GitHub] [spark] attilapiros commented on a change in pull request #33261: [SPARK-35334][K8S] Make Spark more resilient to intermittent K8s flakiness

2021-07-12 Thread GitBox
attilapiros commented on a change in pull request #33261: URL: https://github.com/apache/spark/pull/33261#discussion_r668029007 ## File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala ## @@ -74,6 +77,11 @@

[GitHub] [spark] srowen commented on a change in pull request #33291: [SPARK-35561][SQL] Remove leading zeros from empty static number type partition

2021-07-12 Thread GitBox
srowen commented on a change in pull request #33291: URL: https://github.com/apache/spark/pull/33291#discussion_r667923793 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala ## @@ -351,10 +351,24 @@ object

[GitHub] [spark] SparkQA removed a comment on pull request #33258: [SPARK-36037][SQL] Support ANSI SQL LOCALTIMESTAMP datetime value function

2021-07-12 Thread GitBox
SparkQA removed a comment on pull request #33258: URL: https://github.com/apache/spark/pull/33258#issuecomment-878119735 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

<    1   2   3   4   5   6   >