[GitHub] [spark] cloud-fan closed pull request #37165: [SPARK-39699][SQL] Make CollapseProject smarter about collection creation expressions

2022-07-12 Thread GitBox
cloud-fan closed pull request #37165: [SPARK-39699][SQL] Make CollapseProject smarter about collection creation expressions URL: https://github.com/apache/spark/pull/37165 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] cloud-fan commented on pull request #37165: [SPARK-39699][SQL] Make CollapseProject smarter about collection creation expressions

2022-07-12 Thread GitBox
cloud-fan commented on PR #37165: URL: https://github.com/apache/spark/pull/37165#issuecomment-1182794399 thanks for review, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] LuciferYang commented on a diff in pull request #37163: [SPARK-39750][SQL] Enable `spark.sql.cbo.enabled` by default

2022-07-12 Thread GitBox
LuciferYang commented on code in PR #37163: URL: https://github.com/apache/spark/pull/37163#discussion_r919652743 ## sql/core/src/test/scala/org/apache/spark/sql/InjectRuntimeFilterSuite.scala: ## @@ -209,11 +209,13 @@ class InjectRuntimeFilterSuite extends QueryTest with

[GitHub] [spark] cloud-fan commented on a diff in pull request #37074: [SPARK-39672][SQL][3.1] Fix removing project before filter with correlated subquery

2022-07-12 Thread GitBox
cloud-fan commented on code in PR #37074: URL: https://github.com/apache/spark/pull/37074#discussion_r919644313 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -742,12 +742,18 @@ object ColumnPruning extends Rule[LogicalPlan] {

[GitHub] [spark] HyukjinKwon commented on pull request #37168: [SPARK-39756][PYTHON] Better error messages for missing pandas scalars

2022-07-12 Thread GitBox
HyukjinKwon commented on PR #37168: URL: https://github.com/apache/spark/pull/37168#issuecomment-1182759428 cc @ueshin FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] cloud-fan commented on a diff in pull request #37165: [SPARK-39699][SQL] Make CollapseProject smarter about collection creation expressions

2022-07-12 Thread GitBox
cloud-fan commented on code in PR #37165: URL: https://github.com/apache/spark/pull/37165#discussion_r919620796 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -1011,24 +1011,92 @@ object CollapseProject extends Rule[LogicalPlan]

[GitHub] [spark] cloud-fan commented on a diff in pull request #37165: [SPARK-39699][SQL] Make CollapseProject smarter about collection creation expressions

2022-07-12 Thread GitBox
cloud-fan commented on code in PR #37165: URL: https://github.com/apache/spark/pull/37165#discussion_r919620700 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala: ## @@ -659,7 +659,7 @@ case class WithField(name: String, valExpr:

[GitHub] [spark] cloud-fan commented on a diff in pull request #37165: [SPARK-39699][SQL] Make CollapseProject smarter about collection creation expressions

2022-07-12 Thread GitBox
cloud-fan commented on code in PR #37165: URL: https://github.com/apache/spark/pull/37165#discussion_r919620341 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -1011,24 +1011,92 @@ object CollapseProject extends Rule[LogicalPlan]

[GitHub] [spark] manuzhang commented on pull request #37074: [SPARK-39672][SQL][3.1] Fix removing project before filter with correlated subquery

2022-07-12 Thread GitBox
manuzhang commented on PR #37074: URL: https://github.com/apache/spark/pull/37074#issuecomment-1182730611 @cloud-fan Please take another look. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] viirya commented on a diff in pull request #37165: [SPARK-39699][SQL] Make CollapseProject smarter about collection creation expressions

2022-07-12 Thread GitBox
viirya commented on code in PR #37165: URL: https://github.com/apache/spark/pull/37165#discussion_r919607195 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -1011,24 +1011,92 @@ object CollapseProject extends Rule[LogicalPlan] with

[GitHub] [spark] viirya commented on a diff in pull request #37165: [SPARK-39699][SQL] Make CollapseProject smarter about collection creation expressions

2022-07-12 Thread GitBox
viirya commented on code in PR #37165: URL: https://github.com/apache/spark/pull/37165#discussion_r919605415 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -1011,24 +1011,92 @@ object CollapseProject extends Rule[LogicalPlan] with

[GitHub] [spark] viirya commented on a diff in pull request #37165: [SPARK-39699][SQL] Make CollapseProject smarter about collection creation expressions

2022-07-12 Thread GitBox
viirya commented on code in PR #37165: URL: https://github.com/apache/spark/pull/37165#discussion_r919605135 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala: ## @@ -659,7 +659,7 @@ case class WithField(name: String, valExpr:

[GitHub] [spark] wangyum commented on pull request #37163: [SPARK-39750][SQL] Enable `spark.sql.cbo.enabled` by default

2022-07-12 Thread GitBox
wangyum commented on PR #37163: URL: https://github.com/apache/spark/pull/37163#issuecomment-1182708274 @viirya Here is an example. AQE can be converted to broadcast join, but the performance is worse than directly planned to broadcast join. ```scala import

[GitHub] [spark] chenzhx opened a new pull request, #37169: [SPARK-38901][SQL] DS V2 supports push down misc functions

2022-07-12 Thread GitBox
chenzhx opened a new pull request, #37169: URL: https://github.com/apache/spark/pull/37169 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] HeartSaVioR closed pull request #37167: [SPARK-39748][SQL][FOLLOWUP] Add missing origin logical plan on DataFrame.checkpoint on building LogicalRDD

2022-07-12 Thread GitBox
HeartSaVioR closed pull request #37167: [SPARK-39748][SQL][FOLLOWUP] Add missing origin logical plan on DataFrame.checkpoint on building LogicalRDD URL: https://github.com/apache/spark/pull/37167 -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] HeartSaVioR commented on pull request #37167: [SPARK-39748][SQL][FOLLOWUP] Add missing origin logical plan on DataFrame.checkpoint on building LogicalRDD

2022-07-12 Thread GitBox
HeartSaVioR commented on PR #37167: URL: https://github.com/apache/spark/pull/37167#issuecomment-1182698788 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] xinrong-databricks opened a new pull request, #37168: [SPARK-39756][PYTHON] Better error messages for missing pandas scalars

2022-07-12 Thread GitBox
xinrong-databricks opened a new pull request, #37168: URL: https://github.com/apache/spark/pull/37168 ### What changes were proposed in this pull request? pandas scalars are not reimplemented in pandas API on Spark intentionally. Users may use pandas scalars in pandas API on Spark

[GitHub] [spark] beliefer commented on pull request #37116: [SPARK-39707][SQL][DOCS] Add SQL reference for aggregate functions

2022-07-12 Thread GitBox
beliefer commented on PR #37116: URL: https://github.com/apache/spark/pull/37116#issuecomment-1182694683 @cloud-fan @dongjoon-hyun @MaxGekk Thank you ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] wangyum commented on pull request #37163: [SPARK-39750][SQL] Enable `spark.sql.cbo.enabled` by default

2022-07-12 Thread GitBox
wangyum commented on PR #37163: URL: https://github.com/apache/spark/pull/37163#issuecomment-1182692655 > @wangyum do you have any numbers on the performance gain from this? Will update the benchmark results later. -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] LuciferYang commented on pull request #37115: [SPARK-39706][SQL] Set missing column with defaultValue as constant in `ParquetColumnVector`

2022-07-12 Thread GitBox
LuciferYang commented on PR #37115: URL: https://github.com/apache/spark/pull/37115#issuecomment-1182689346 thanks @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] LuciferYang commented on pull request #37102: [SPARK-39694][TESTS] Use `${projectName}/Test/runMain` to run benchmarks

2022-07-12 Thread GitBox
LuciferYang commented on PR #37102: URL: https://github.com/apache/spark/pull/37102#issuecomment-1182689203 thanks @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon closed pull request #37117: [SPARK-39714][PYTHON] Try to fix the mypy annotation tests

2022-07-12 Thread GitBox
HyukjinKwon closed pull request #37117: [SPARK-39714][PYTHON] Try to fix the mypy annotation tests URL: https://github.com/apache/spark/pull/37117 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] HyukjinKwon commented on pull request #37117: [SPARK-39714][PYTHON] Try to fix the mypy annotation tests

2022-07-12 Thread GitBox
HyukjinKwon commented on PR #37117: URL: https://github.com/apache/spark/pull/37117#issuecomment-1182638751 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] bzhaoopenstack commented on a diff in pull request #37117: [SPARK-39714][PYTHON] Try to fix the mypy annotation tests

2022-07-12 Thread GitBox
bzhaoopenstack commented on code in PR #37117: URL: https://github.com/apache/spark/pull/37117#discussion_r919545172 ## python/pyspark/ml/util.py: ## @@ -536,10 +536,8 @@ def __get_class(clazz: str) -> Type[RL]: """ parts = clazz.split(".") module =

[GitHub] [spark] viirya commented on pull request #37167: [SPARK-39748][SQL][FOLLOWUP] Add missing origin logical plan on DataFrame.checkpoint on building LogicalRDD

2022-07-12 Thread GitBox
viirya commented on PR #37167: URL: https://github.com/apache/spark/pull/37167#issuecomment-1182618509 lgtm -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36893: [SPARK-39494][PYTHON] Support `createDataFrame` from a list of scalars when schema is not provided

2022-07-12 Thread GitBox
HyukjinKwon commented on code in PR #36893: URL: https://github.com/apache/spark/pull/36893#discussion_r919529370 ## python/pyspark/sql/session.py: ## @@ -1023,6 +1023,20 @@ def prepare(obj: Any) -> Any: if isinstance(data, RDD): rdd, struct =

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #37161: [SPARK-39748][SQL][SS] Include the origin logical plan for LogicalRDD if it comes from DataFrame

2022-07-12 Thread GitBox
HeartSaVioR commented on code in PR #37161: URL: https://github.com/apache/spark/pull/37161#discussion_r919518440 ## sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -705,6 +705,7 @@ class Dataset[T] private[sql]( LogicalRDD(

[GitHub] [spark] HeartSaVioR commented on pull request #37167: [SPARK-39748][SQL][FOLLOWUP] Add missing origin logical plan on DataFrame.checkpoint on building LogicalRDD

2022-07-12 Thread GitBox
HeartSaVioR commented on PR #37167: URL: https://github.com/apache/spark/pull/37167#issuecomment-1182603290 cc. @viirya -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] HeartSaVioR opened a new pull request, #37167: [SPARK-39748][SQL][FOLLOWUP] Add missing origin logical plan on DataFrame.checkpoint on building LogicalRDD

2022-07-12 Thread GitBox
HeartSaVioR opened a new pull request, #37167: URL: https://github.com/apache/spark/pull/37167 ### What changes were proposed in this pull request? This PR adds missing origin logical plan on building LogicalRDD in DataFrame.checkpoint, via review comment

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #37161: [SPARK-39748][SQL][SS] Include the origin logical plan for LogicalRDD if it comes from DataFrame

2022-07-12 Thread GitBox
HeartSaVioR commented on code in PR #37161: URL: https://github.com/apache/spark/pull/37161#discussion_r919505305 ## sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -705,6 +705,7 @@ class Dataset[T] private[sql]( LogicalRDD(

[GitHub] [spark] anleib commented on pull request #22878: [SPARK-25789][SQL] Support for Dataset of Avro

2022-07-12 Thread GitBox
anleib commented on PR #22878: URL: https://github.com/apache/spark/pull/22878#issuecomment-1182575154 Can this be re-opened? Is there someone who can take it through the finish line? Seems like this PR is really close. This is hugely useful to those of us who are working with Kafka +

[GitHub] [spark] mridulm commented on pull request #37052: [SPARK-39647][CORE] Register the executor with ESS before registering the BlockManager

2022-07-12 Thread GitBox
mridulm commented on PR #37052: URL: https://github.com/apache/spark/pull/37052#issuecomment-1182527170 Good question @dongjoon-hyun, I was not sure about 3.2 actually - I was assuming it wont be relevant there (since this change is relevant mostly in context of push based shuffle). But

[GitHub] [spark] gengliangwang commented on a diff in pull request #37160: [SPARK-39749][SQL] Always use plain string representation on casting Decimal to String

2022-07-12 Thread GitBox
gengliangwang commented on code in PR #37160: URL: https://github.com/apache/spark/pull/37160#discussion_r919429506 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuiteBase.scala: ## @@ -1305,4 +1305,12 @@ abstract class CastSuiteBase extends

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #37160: [SPARK-39749][SQL] Always use plain string representation on casting Decimal to String

2022-07-12 Thread GitBox
dongjoon-hyun commented on code in PR #37160: URL: https://github.com/apache/spark/pull/37160#discussion_r919426563 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuiteBase.scala: ## @@ -1305,4 +1305,12 @@ abstract class CastSuiteBase extends

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #37160: [SPARK-39749][SQL] Always use plain string representation on casting Decimal to String

2022-07-12 Thread GitBox
dongjoon-hyun commented on code in PR #37160: URL: https://github.com/apache/spark/pull/37160#discussion_r919426563 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuiteBase.scala: ## @@ -1305,4 +1305,12 @@ abstract class CastSuiteBase extends

[GitHub] [spark] bersprockets commented on a diff in pull request #36871: [SPARK-39469][SQL] Infer date type for CSV schema inference

2022-07-12 Thread GitBox
bersprockets commented on code in PR #36871: URL: https://github.com/apache/spark/pull/36871#discussion_r919399248 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala: ## @@ -148,7 +148,28 @@ class CSVOptions( // A language tag in IETF BCP 47

[GitHub] [spark] dongjoon-hyun closed pull request #37056: [SPARK-39665][INFRA] Bump workflow versions in GitHub Actions

2022-07-12 Thread GitBox
dongjoon-hyun closed pull request #37056: [SPARK-39665][INFRA] Bump workflow versions in GitHub Actions URL: https://github.com/apache/spark/pull/37056 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] dongjoon-hyun commented on pull request #37056: [SPARK-39665][INFRA] Bump workflow versions in GitHub Actions

2022-07-12 Thread GitBox
dongjoon-hyun commented on PR #37056: URL: https://github.com/apache/spark/pull/37056#issuecomment-1181988831 Yes, setup-java is especially aggressive. So, @ArjunSharda , your approach initially didn't consider much about the GitHub Action's upgrade cadence and the risk on Apache Spark

[GitHub] [spark] manuzhang commented on pull request #37074: [SPARK-39672][SQL][3.1] Fix de-duplicating conflicting attributes when rewriting subquery

2022-07-12 Thread GitBox
manuzhang commented on PR #37074: URL: https://github.com/apache/spark/pull/37074#issuecomment-1181805060 `DeduplicateRelations` doesn't exist in 3.1. > Can you explain the rationale? I'm not sure about it. BTW, could you give an example for why the check is needed in the first

[GitHub] [spark] MaxGekk commented on a diff in pull request #37154: [SPARK-39744][SQL] Add the `REGEXP_INSTR` function

2022-07-12 Thread GitBox
MaxGekk commented on code in PR #37154: URL: https://github.com/apache/spark/pull/37154#discussion_r918754573 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala: ## @@ -1043,3 +1043,89 @@ case class RegExpSubStr(left: Expression,

[GitHub] [spark] dongjoon-hyun commented on pull request #36207: [SPARK-38910][YARN] Clean spark staging before `unregister`

2022-07-12 Thread GitBox
dongjoon-hyun commented on PR #36207: URL: https://github.com/apache/spark/pull/36207#issuecomment-1181983884 This commit is reverted according to the above discussion. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] ulysses-you commented on a diff in pull request #37129: [SPARK-39710][SQL] Support push local topK through outer join

2022-07-12 Thread GitBox
ulysses-you commented on code in PR #37129: URL: https://github.com/apache/spark/pull/37129#discussion_r918870960 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/PushLocalTopKThroughOuterJoin.scala: ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache

[GitHub] [spark] cloud-fan commented on a diff in pull request #36150: [SPARK-38864][SQL] Add melt / unpivot to Dataset

2022-07-12 Thread GitBox
cloud-fan commented on code in PR #36150: URL: https://github.com/apache/spark/pull/36150#discussion_r919012662 ## core/src/main/resources/error/error-classes.json: ## @@ -256,6 +256,18 @@ "Key does not exist. Use `try_element_at` to tolerate non-existent key and

[GitHub] [spark] dongjoon-hyun commented on pull request #36207: [SPARK-38910][YARN] Clean spark staging before `unregister`

2022-07-12 Thread GitBox
dongjoon-hyun commented on PR #36207: URL: https://github.com/apache/spark/pull/36207#issuecomment-1181981183 @AngersZh . According to the follow-up PR's content, I believe we had better revert this commit and start as non-followup PR freshly. -- This is an automated message from the

[GitHub] [spark] ulysses-you commented on a diff in pull request #37129: [SPARK-39710][SQL] Support push local topK through outer join

2022-07-12 Thread GitBox
ulysses-you commented on code in PR #37129: URL: https://github.com/apache/spark/pull/37129#discussion_r918865962 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -230,6 +230,7 @@ abstract class Optimizer(catalogManager:

[GitHub] [spark] ulysses-you commented on a diff in pull request #37129: [SPARK-39710][SQL] Support push local topK through outer join

2022-07-12 Thread GitBox
ulysses-you commented on code in PR #37129: URL: https://github.com/apache/spark/pull/37129#discussion_r918867434 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/PushLocalTopKThroughOuterJoin.scala: ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache

[GitHub] [spark] dongjoon-hyun closed pull request #37115: [SPARK-39706][SQL] Set missing column with defaultValue as constant in `ParquetColumnVector`

2022-07-12 Thread GitBox
dongjoon-hyun closed pull request #37115: [SPARK-39706][SQL] Set missing column with defaultValue as constant in `ParquetColumnVector` URL: https://github.com/apache/spark/pull/37115 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] cloud-fan commented on a diff in pull request #36150: [SPARK-38864][SQL] Add melt / unpivot to Dataset

2022-07-12 Thread GitBox
cloud-fan commented on code in PR #36150: URL: https://github.com/apache/spark/pull/36150#discussion_r919005101 ## core/src/main/resources/error/error-classes.json: ## @@ -256,6 +256,18 @@ "Key does not exist. Use `try_element_at` to tolerate non-existent key and

[GitHub] [spark] ArjunSharda commented on pull request #37056: [SPARK-39665][INFRA] Bump workflow versions in GitHub Actions

2022-07-12 Thread GitBox
ArjunSharda commented on PR #37056: URL: https://github.com/apache/spark/pull/37056#issuecomment-1181962737 > Do you think the other GitHub Actions are stabler? IMO, the situation is not different. There is no point to track one by one in case of all GitHub Action. I'd recommend to

[GitHub] [spark] cloud-fan commented on a diff in pull request #37113: [SPARK-39741][SQL] Support url encode/decode as built-in function and tidy up url-related functions

2022-07-12 Thread GitBox
cloud-fan commented on code in PR #37113: URL: https://github.com/apache/spark/pull/37113#discussion_r919003373 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/urlExpressions.scala: ## @@ -0,0 +1,290 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] ulysses-you commented on a diff in pull request #37129: [SPARK-39710][SQL] Support push local topK through outer join

2022-07-12 Thread GitBox
ulysses-you commented on code in PR #37129: URL: https://github.com/apache/spark/pull/37129#discussion_r918864202 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/PushLocalTopKThroughOuterJoin.scala: ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache

[GitHub] [spark] dongjoon-hyun closed pull request #37166: [SPARK-39754][CORE][SQL] Remove unused `import` or unnecessary `{}`

2022-07-12 Thread GitBox
dongjoon-hyun closed pull request #37166: [SPARK-39754][CORE][SQL] Remove unused `import` or unnecessary `{}` URL: https://github.com/apache/spark/pull/37166 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] LuciferYang commented on a diff in pull request #37113: [SPARK-39741][SQL] Support url encode/decode as built-in function and tidy up url-related functions

2022-07-12 Thread GitBox
LuciferYang commented on code in PR #37113: URL: https://github.com/apache/spark/pull/37113#discussion_r918916123 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/urlExpressions.scala: ## @@ -0,0 +1,290 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] dongjoon-hyun commented on pull request #37163: [SPARK-39750][SQL] Enable `spark.sql.cbo.enabled` by default

2022-07-12 Thread GitBox
dongjoon-hyun commented on PR #37163: URL: https://github.com/apache/spark/pull/37163#issuecomment-1181960925 cc @sunchao , @viirya , @huaxingao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] LuciferYang commented on a diff in pull request #37113: [SPARK-39741][SQL] Support url encode/decode as built-in function and tidy up url-related functions

2022-07-12 Thread GitBox
LuciferYang commented on code in PR #37113: URL: https://github.com/apache/spark/pull/37113#discussion_r918907788 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/urlExpressions.scala: ## @@ -0,0 +1,290 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] dongjoon-hyun commented on pull request #37056: [SPARK-39665][INFRA] Bump workflow versions in GitHub Actions

2022-07-12 Thread GitBox
dongjoon-hyun commented on PR #37056: URL: https://github.com/apache/spark/pull/37056#issuecomment-1181953060 Do you think the other GitHub Actions are stabler? IMO, the situation is not different. There is no point to track one by one in case of all GitHub Action. I'd recommend to close

[GitHub] [spark] dongjoon-hyun commented on pull request #36696: [SPARK-39312][SQL] Use parquet native In predicate for in filter push down

2022-07-12 Thread GitBox
dongjoon-hyun commented on PR #36696: URL: https://github.com/apache/spark/pull/36696#issuecomment-1182027384 Could you re-trigger once more, @huaxingao ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37117: [SPARK-39714][PYTHON] Try to fix the mypy annotation tests

2022-07-12 Thread GitBox
HyukjinKwon commented on code in PR #37117: URL: https://github.com/apache/spark/pull/37117#discussion_r918728918 ## python/pyspark/ml/util.py: ## @@ -536,10 +536,8 @@ def __get_class(clazz: str) -> Type[RL]: """ parts = clazz.split(".") module =

[GitHub] [spark] Yikf commented on a diff in pull request #37113: [SPARK-39741][SQL] Support url encode/decode as built-in function and tidy up url-related functions

2022-07-12 Thread GitBox
Yikf commented on code in PR #37113: URL: https://github.com/apache/spark/pull/37113#discussion_r918730197 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/urlExpressions.scala: ## @@ -0,0 +1,290 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37117: [SPARK-39714][PYTHON] Try to fix the mypy annotation tests

2022-07-12 Thread GitBox
HyukjinKwon commented on code in PR #37117: URL: https://github.com/apache/spark/pull/37117#discussion_r918728918 ## python/pyspark/ml/util.py: ## @@ -536,10 +536,8 @@ def __get_class(clazz: str) -> Type[RL]: """ parts = clazz.split(".") module =

[GitHub] [spark] dongjoon-hyun commented on pull request #37052: [SPARK-39647][CORE] Register the executor with ESS before registering the BlockManager

2022-07-12 Thread GitBox
dongjoon-hyun commented on PR #37052: URL: https://github.com/apache/spark/pull/37052#issuecomment-1182182128 BTW, do you think we can have this at `branch-3.2` ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] LuciferYang commented on a diff in pull request #37113: [SPARK-39741][SQL] Support url encode/decode as built-in function and tidy up url-related functions

2022-07-12 Thread GitBox
LuciferYang commented on code in PR #37113: URL: https://github.com/apache/spark/pull/37113#discussion_r918907788 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/urlExpressions.scala: ## @@ -0,0 +1,290 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] dongjoon-hyun closed pull request #37116: [SPARK-39707][SQL][DOCS] Add SQL reference for aggregate functions

2022-07-12 Thread GitBox
dongjoon-hyun closed pull request #37116: [SPARK-39707][SQL][DOCS] Add SQL reference for aggregate functions URL: https://github.com/apache/spark/pull/37116 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] HeartSaVioR closed pull request #37161: [SPARK-39748][SQL][SS] Include the origin logical plan for LogicalRDD if it comes from DataFrame

2022-07-12 Thread GitBox
HeartSaVioR closed pull request #37161: [SPARK-39748][SQL][SS] Include the origin logical plan for LogicalRDD if it comes from DataFrame URL: https://github.com/apache/spark/pull/37161 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] dtenedor commented on pull request #36960: [SPARK-39557][SQL] Support ARRAY, STRUCT, MAP types as DEFAULT values

2022-07-12 Thread GitBox
dtenedor commented on PR #36960: URL: https://github.com/apache/spark/pull/36960#issuecomment-1182186864 @gengliangwang a friendly ping since the tests are fixed now .__. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] HeartSaVioR commented on pull request #37161: [SPARK-39748][SQL][SS] Include the origin logical plan for LogicalRDD if it comes from DataFrame

2022-07-12 Thread GitBox
HeartSaVioR commented on PR #37161: URL: https://github.com/apache/spark/pull/37161#issuecomment-1181504913 The change is quite simple and straightforward so I just go with merging with 1 approval. That said, I'm open for post-reviewing. Thanks! Merging to master. -- This is an

[GitHub] [spark] cloud-fan commented on a diff in pull request #37154: [SPARK-39744][SQL] Add the `REGEXP_INSTR` function

2022-07-12 Thread GitBox
cloud-fan commented on code in PR #37154: URL: https://github.com/apache/spark/pull/37154#discussion_r918724266 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala: ## @@ -1043,3 +1043,89 @@ case class RegExpSubStr(left: Expression,

[GitHub] [spark] panbingkun commented on pull request #37166: [SPARK-39754][CORE][SQL] Fix import issues in Scala/Java

2022-07-12 Thread GitBox
panbingkun commented on PR #37166: URL: https://github.com/apache/spark/pull/37166#issuecomment-1181630909 ### inspect java code by: > Tool: "/Applications/IntelliJ IDEA CE.app/Contents/bin/inspect.sh" > Rule: Imports - Unused import ### inspect scala code by: > sbt "scalafix

[GitHub] [spark] cloud-fan commented on a diff in pull request #36150: [SPARK-38864][SQL] Add melt / unpivot to Dataset

2022-07-12 Thread GitBox
cloud-fan commented on code in PR #36150: URL: https://github.com/apache/spark/pull/36150#discussion_r919137239 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -524,6 +525,10 @@ class Analyzer(override val catalogManager:

[GitHub] [spark] LuciferYang commented on a diff in pull request #37113: [SPARK-39741][SQL] Support url encode/decode as built-in function and tidy up url-related functions

2022-07-12 Thread GitBox
LuciferYang commented on code in PR #37113: URL: https://github.com/apache/spark/pull/37113#discussion_r918907788 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/urlExpressions.scala: ## @@ -0,0 +1,290 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] panbingkun opened a new pull request, #37166: [SPARK-39754][CORE][SQL] Fix import issues in Scala/Java

2022-07-12 Thread GitBox
panbingkun opened a new pull request, #37166: URL: https://github.com/apache/spark/pull/37166 ### What changes were proposed in this pull request? Review code and found some issue about import in java & scala basecode, mainly focus on: > 1.unnecessary braces in single import >

[GitHub] [spark] Yikun commented on a diff in pull request #37117: [SPARK-39714][PYTHON] Try to fix the mypy annotation tests

2022-07-12 Thread GitBox
Yikun commented on code in PR #37117: URL: https://github.com/apache/spark/pull/37117#discussion_r918757225 ## python/pyspark/ml/util.py: ## @@ -536,10 +536,8 @@ def __get_class(clazz: str) -> Type[RL]: """ parts = clazz.split(".") module =

[GitHub] [spark] LuciferYang commented on a diff in pull request #37113: [SPARK-39741][SQL] Support url encode/decode as built-in function and tidy up url-related functions

2022-07-12 Thread GitBox
LuciferYang commented on code in PR #37113: URL: https://github.com/apache/spark/pull/37113#discussion_r919054230 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/urlExpressions.scala: ## @@ -0,0 +1,290 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] cloud-fan commented on pull request #37074: [SPARK-39672][SQL][3.1] Fix de-duplicating conflicting attributes when rewriting subquery

2022-07-12 Thread GitBox
cloud-fan commented on PR #37074: URL: https://github.com/apache/spark/pull/37074#issuecomment-1181935798 After more thoughts, I think we should treat correlated subquery as a join in optimizer rules. So in this case, once we remove the `Project`, the plan becomes invalid, because the

[GitHub] [spark] LuciferYang commented on a diff in pull request #37113: [SPARK-39741][SQL] Support url encode/decode as built-in function and tidy up url-related functions

2022-07-12 Thread GitBox
LuciferYang commented on code in PR #37113: URL: https://github.com/apache/spark/pull/37113#discussion_r918972758 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/urlExpressions.scala: ## @@ -0,0 +1,290 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] viirya commented on a diff in pull request #37161: [SPARK-39748][SQL][SS] Include the origin logical plan for LogicalRDD if it comes from DataFrame

2022-07-12 Thread GitBox
viirya commented on code in PR #37161: URL: https://github.com/apache/spark/pull/37161#discussion_r919204026 ## sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -705,6 +705,7 @@ class Dataset[T] private[sql]( LogicalRDD( logicalPlan.output,

[GitHub] [spark] dongjoon-hyun closed pull request #37102: [SPARK-39694][TESTS] Use `${projectName}/Test/runMain` to run benchmarks

2022-07-12 Thread GitBox
dongjoon-hyun closed pull request #37102: [SPARK-39694][TESTS] Use `${projectName}/Test/runMain` to run benchmarks URL: https://github.com/apache/spark/pull/37102 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] huaxingao commented on pull request #37080: [SPARK-35208][SQL][DOCS] Add docs for LATERAL subqueries

2022-07-12 Thread GitBox
huaxingao commented on PR #37080: URL: https://github.com/apache/spark/pull/37080#issuecomment-1181927091 Thanks @cloud-fan @allisonwang-db @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] Jonathancui123 commented on a diff in pull request #36871: [SPARK-39469][SQL] Infer date type for CSV schema inference

2022-07-12 Thread GitBox
Jonathancui123 commented on code in PR #36871: URL: https://github.com/apache/spark/pull/36871#discussion_r919291362 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala: ## @@ -148,7 +148,28 @@ class CSVOptions( // A language tag in IETF BCP 47

[GitHub] [spark] MaxGekk closed pull request #37154: [SPARK-39744][SQL] Add the `REGEXP_INSTR` function

2022-07-12 Thread GitBox
MaxGekk closed pull request #37154: [SPARK-39744][SQL] Add the `REGEXP_INSTR` function URL: https://github.com/apache/spark/pull/37154 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] gengliangwang commented on pull request #36960: [SPARK-39557][SQL] Support ARRAY, STRUCT, MAP types as DEFAULT values

2022-07-12 Thread GitBox
gengliangwang commented on PR #36960: URL: https://github.com/apache/spark/pull/36960#issuecomment-1182324960 Thanks, merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] MaxGekk commented on pull request #37154: [SPARK-39744][SQL] Add the `REGEXP_INSTR` function

2022-07-12 Thread GitBox
MaxGekk commented on PR #37154: URL: https://github.com/apache/spark/pull/37154#issuecomment-1181751673 Merging to master. Thank you, @cloud-fan for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] Jonathancui123 commented on a diff in pull request #36871: [SPARK-39469][SQL] Infer date type for CSV schema inference

2022-07-12 Thread GitBox
Jonathancui123 commented on code in PR #36871: URL: https://github.com/apache/spark/pull/36871#discussion_r919291362 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala: ## @@ -148,7 +148,28 @@ class CSVOptions( // A language tag in IETF BCP 47

[GitHub] [spark] Yikf commented on a diff in pull request #37113: [SPARK-39741][SQL] Support url encode/decode as built-in function and tidy up url-related functions

2022-07-12 Thread GitBox
Yikf commented on code in PR #37113: URL: https://github.com/apache/spark/pull/37113#discussion_r919041742 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/urlExpressions.scala: ## @@ -0,0 +1,290 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] MaxGekk commented on a diff in pull request #37154: [SPARK-39744][SQL] Add the `REGEXP_INSTR` function

2022-07-12 Thread GitBox
MaxGekk commented on code in PR #37154: URL: https://github.com/apache/spark/pull/37154#discussion_r918770177 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala: ## @@ -1043,3 +1043,89 @@ case class RegExpSubStr(left: Expression,

[GitHub] [spark] Yikun commented on a diff in pull request #37117: [SPARK-39714][PYTHON] Try to fix the mypy annotation tests

2022-07-12 Thread GitBox
Yikun commented on code in PR #37117: URL: https://github.com/apache/spark/pull/37117#discussion_r918757225 ## python/pyspark/ml/util.py: ## @@ -536,10 +536,8 @@ def __get_class(clazz: str) -> Type[RL]: """ parts = clazz.split(".") module =

[GitHub] [spark] Yikun commented on a diff in pull request #37117: [SPARK-39714][PYTHON] Try to fix the mypy annotation tests

2022-07-12 Thread GitBox
Yikun commented on code in PR #37117: URL: https://github.com/apache/spark/pull/37117#discussion_r918757225 ## python/pyspark/ml/util.py: ## @@ -536,10 +536,8 @@ def __get_class(clazz: str) -> Type[RL]: """ parts = clazz.split(".") module =

[GitHub] [spark] Yikun commented on a diff in pull request #37117: [SPARK-39714][PYTHON] Try to fix the mypy annotation tests

2022-07-12 Thread GitBox
Yikun commented on code in PR #37117: URL: https://github.com/apache/spark/pull/37117#discussion_r918757225 ## python/pyspark/ml/util.py: ## @@ -536,10 +536,8 @@ def __get_class(clazz: str) -> Type[RL]: """ parts = clazz.split(".") module =

[GitHub] [spark] Yikun commented on a diff in pull request #37117: [SPARK-39714][PYTHON] Try to fix the mypy annotation tests

2022-07-12 Thread GitBox
Yikun commented on code in PR #37117: URL: https://github.com/apache/spark/pull/37117#discussion_r918757225 ## python/pyspark/ml/util.py: ## @@ -536,10 +536,8 @@ def __get_class(clazz: str) -> Type[RL]: """ parts = clazz.split(".") module =

[GitHub] [spark] Yikun commented on a diff in pull request #37117: [SPARK-39714][PYTHON] Try to fix the mypy annotation tests

2022-07-12 Thread GitBox
Yikun commented on code in PR #37117: URL: https://github.com/apache/spark/pull/37117#discussion_r918757225 ## python/pyspark/ml/util.py: ## @@ -536,10 +536,8 @@ def __get_class(clazz: str) -> Type[RL]: """ parts = clazz.split(".") module =

[GitHub] [spark] Yikun commented on a diff in pull request #37117: [SPARK-39714][PYTHON] Try to fix the mypy annotation tests

2022-07-12 Thread GitBox
Yikun commented on code in PR #37117: URL: https://github.com/apache/spark/pull/37117#discussion_r918757225 ## python/pyspark/ml/util.py: ## @@ -536,10 +536,8 @@ def __get_class(clazz: str) -> Type[RL]: """ parts = clazz.split(".") module =

[GitHub] [spark] Yikun commented on a diff in pull request #37117: [SPARK-39714][PYTHON] Try to fix the mypy annotation tests

2022-07-12 Thread GitBox
Yikun commented on code in PR #37117: URL: https://github.com/apache/spark/pull/37117#discussion_r918757225 ## python/pyspark/ml/util.py: ## @@ -536,10 +536,8 @@ def __get_class(clazz: str) -> Type[RL]: """ parts = clazz.split(".") module =

[GitHub] [spark] sunchao commented on pull request #37163: [SPARK-39750][SQL] Enable `spark.sql.cbo.enabled` by default

2022-07-12 Thread GitBox
sunchao commented on PR #37163: URL: https://github.com/apache/spark/pull/37163#issuecomment-1181997866 @wangyum do you have any numbers on the performance gain from this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] Yikun commented on a diff in pull request #37117: [SPARK-39714][PYTHON] Try to fix the mypy annotation tests

2022-07-12 Thread GitBox
Yikun commented on code in PR #37117: URL: https://github.com/apache/spark/pull/37117#discussion_r918757225 ## python/pyspark/ml/util.py: ## @@ -536,10 +536,8 @@ def __get_class(clazz: str) -> Type[RL]: """ parts = clazz.split(".") module =

[GitHub] [spark] EnricoMi commented on a diff in pull request #36150: [SPARK-38864][SQL] Add melt / unpivot to Dataset

2022-07-12 Thread GitBox
EnricoMi commented on code in PR #36150: URL: https://github.com/apache/spark/pull/36150#discussion_r919026261 ## core/src/main/resources/error/error-classes.json: ## @@ -256,6 +256,18 @@ "Key does not exist. Use `try_element_at` to tolerate non-existent key and return

[GitHub] [spark] LuciferYang commented on a diff in pull request #37113: [SPARK-39741][SQL] Support url encode/decode as built-in function and tidy up url-related functions

2022-07-12 Thread GitBox
LuciferYang commented on code in PR #37113: URL: https://github.com/apache/spark/pull/37113#discussion_r918759285 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/urlExpressions.scala: ## @@ -0,0 +1,290 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] Yikun commented on a diff in pull request #37117: [SPARK-39714][PYTHON] Try to fix the mypy annotation tests

2022-07-12 Thread GitBox
Yikun commented on code in PR #37117: URL: https://github.com/apache/spark/pull/37117#discussion_r918757225 ## python/pyspark/ml/util.py: ## @@ -536,10 +536,8 @@ def __get_class(clazz: str) -> Type[RL]: """ parts = clazz.split(".") module =

[GitHub] [spark] cloud-fan commented on a diff in pull request #37154: [SPARK-39744][SQL] Add the `REGEXP_INSTR` function

2022-07-12 Thread GitBox
cloud-fan commented on code in PR #37154: URL: https://github.com/apache/spark/pull/37154#discussion_r918719734 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala: ## @@ -1043,3 +1043,89 @@ case class RegExpSubStr(left: Expression,

[GitHub] [spark] cloud-fan commented on a diff in pull request #37104: [SPARK-39698][SQL] Use `TakeOrderedAndProject` if maxRows below the topKSortFallbackThreshold

2022-07-12 Thread GitBox
cloud-fan commented on code in PR #37104: URL: https://github.com/apache/spark/pull/37104#discussion_r918716943 ## sql/core/src/test/resources/sql-tests/results/order-by-ordinal.sql.out: ## @@ -14,19 +14,6 @@ struct<> --- !query -select * from data order by 1 desc Review

[GitHub] [spark] cloud-fan commented on a diff in pull request #37104: [SPARK-39698][SQL] Use `TakeOrderedAndProject` if maxRows below the topKSortFallbackThreshold

2022-07-12 Thread GitBox
cloud-fan commented on code in PR #37104: URL: https://github.com/apache/spark/pull/37104#discussion_r918716212 ## sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala: ## @@ -130,8 +130,23 @@ abstract class SparkStrategies extends

  1   2   >