Re: [PR] [SPARK-45943][SQL] Add a tag to mark resolved subquery plan for ResolveSubquery [spark]

2023-11-26 Thread via GitHub
wForget commented on PR #43867: URL: https://github.com/apache/spark/pull/43867#issuecomment-1826721672 > Good point. Why is `DetermineTableStats` a post-hoc resolution rule then... It was introduced in #17015, I don't see more discussion about it. I willl try moving it into Resolutio

Re: [PR] [SPARK-46100][CORE][R][SQL][SS][CONNECT][ML][MLLIB][DSTREAM][GRAPHX][K8S][AVRO][EXAMPLES] Replace the remaining `(string|array).size` with `(string|array).length` [spark]

2023-11-26 Thread via GitHub
LuciferYang closed pull request #44014: [SPARK-46100][CORE][R][SQL][SS][CONNECT][ML][MLLIB][DSTREAM][GRAPHX][K8S][AVRO][EXAMPLES] Replace the remaining `(string|array).size` with `(string|array).length` URL: https://github.com/apache/spark/pull/44014 -- This is an automated message from the

Re: [PR] [SPARK-45826][SQL] Add a SQL config for stack traces in DataFrame query context [spark]

2023-11-26 Thread via GitHub
MaxGekk commented on PR #43695: URL: https://github.com/apache/spark/pull/43695#issuecomment-1826780514 Merging to master. Thank you, @cloud-fan and @beliefer for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]

2023-11-26 Thread via GitHub
beliefer commented on code in PR #43958: URL: https://github.com/apache/spark/pull/43958#discussion_r1405397072 ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala: ## @@ -175,6 +178,23 @@ abstract class OffsetWindowFunctionFrameBase(

Re: [PR] [SPARK-45826][SQL] Add a SQL config for stack traces in DataFrame query context [spark]

2023-11-26 Thread via GitHub
MaxGekk closed pull request #43695: [SPARK-45826][SQL] Add a SQL config for stack traces in DataFrame query context URL: https://github.com/apache/spark/pull/43695 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[PR] [SPARK-39769][SQL][3.3.0] Correcting the misspelling Unevaluable -> Inevaluable [spark]

2023-11-26 Thread via GitHub
luogaiyu opened a new pull request, #44017: URL: https://github.com/apache/spark/pull/44017 ### What changes were proposed in this pull request? Correcting the misspelling Unvaluable -> Invaluable ### Why are the changes needed? Spelling mistake ### Does th

Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]

2023-11-26 Thread via GitHub
cloud-fan commented on code in PR #43958: URL: https://github.com/apache/spark/pull/43958#discussion_r1405407470 ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala: ## @@ -196,24 +216,15 @@ class FrameLessOffsetWindowFunctionFrame( of

[PR] [SPARK-46106]If the hive table is a table, the outsourcing information will be displayed during ShowCreateTableCommand. [spark]

2023-11-26 Thread via GitHub
guixiaowen opened a new pull request, #44018: URL: https://github.com/apache/spark/pull/44018 … will be displayed during ShowCreateTableCommand. ### What changes were proposed in this pull request? If the hive table is a table, the outsourcing information will be displayed

Re: [PR] [SPARK-46069][SQL] Support unwrap timestamp type to date type [spark]

2023-11-26 Thread via GitHub
wangyum commented on code in PR #43982: URL: https://github.com/apache/spark/pull/43982#discussion_r1405415194 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala: ## @@ -329,6 +334,40 @@ object UnwrapCastInBinaryComparison e

Re: [PR] [SPARK-46069][SQL] Support unwrap timestamp type to date type [spark]

2023-11-26 Thread via GitHub
wangyum commented on code in PR #43982: URL: https://github.com/apache/spark/pull/43982#discussion_r1405415267 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala: ## @@ -329,6 +334,40 @@ object UnwrapCastInBinaryComparison e

Re: [PR] [SPARK-46069][SQL] Support unwrap timestamp type to date type [spark]

2023-11-26 Thread via GitHub
wangyum commented on code in PR #43982: URL: https://github.com/apache/spark/pull/43982#discussion_r1405415554 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala: ## @@ -329,6 +334,40 @@ object UnwrapCastInBinaryComparison e

Re: [PR] [SPARK-46069][SQL] Support unwrap timestamp type to date type [spark]

2023-11-26 Thread via GitHub
wangyum commented on code in PR #43982: URL: https://github.com/apache/spark/pull/43982#discussion_r1405415267 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala: ## @@ -329,6 +334,40 @@ object UnwrapCastInBinaryComparison e

Re: [PR] [SPARK-45943][SQL] Add a tag to mark resolved subquery plan for ResolveSubquery [spark]

2023-11-26 Thread via GitHub
wForget commented on PR #43867: URL: https://github.com/apache/spark/pull/43867#issuecomment-1826808784 > > This may also affect other Rules. Since HiveTableRelation is not resolved, the Project.projectList of the parent plan will not be resolved. > > Good point. Why is `DetermineTabl

Re: [PR] [SPARK-45974][SQL] Add scan.filterAttributes non-empty judgment for RowLevelOperationRuntimeGroupFiltering [spark]

2023-11-26 Thread via GitHub
wForget commented on PR #43869: URL: https://github.com/apache/spark/pull/43869#issuecomment-1826809454 @cloud-fan Thanks for your review, can this be merged? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

[PR] [SPARK-46106]If the hive table is a table, the outsourcing information will be displayed during ShowCreateTableCommand. #44018 [spark]

2023-11-26 Thread via GitHub
guixiaowen opened a new pull request, #44019: URL: https://github.com/apache/spark/pull/44019 … will be displayed during ShowCreateTableCommand. What changes were proposed in this pull request? If the hive table is a table, the outsourcing information will be displayed during ShowC

Re: [PR] [SPARK-46106]If the hive table is a table, the outsourcing information will be displayed during ShowCreateTableCommand. [spark]

2023-11-26 Thread via GitHub
guixiaowen closed pull request #44018: [SPARK-46106]If the hive table is a table, the outsourcing information will be displayed during ShowCreateTableCommand. URL: https://github.com/apache/spark/pull/44018 -- This is an automated message from the Apache Git Service. To respond to the messag

Re: [PR] [SPARK-45943][SQL] Add a tag to mark resolved subquery plan for ResolveSubquery [spark]

2023-11-26 Thread via GitHub
cloud-fan commented on PR #43867: URL: https://github.com/apache/spark/pull/43867#issuecomment-1826809747 I think we should only put a rule in posthoc resolution if it needs the entire plan to be resolved first. `DetermineTableStats` is not the case. -- This is an automated message from t

Re: [PR] [SPARK-45943][SQL] Add a tag to mark resolved subquery plan for ResolveSubquery [spark]

2023-11-26 Thread via GitHub
wForget commented on PR #43867: URL: https://github.com/apache/spark/pull/43867#issuecomment-1826812617 > I think we should only put a rule in posthoc resolution if it needs the entire plan to be resolved first. `DetermineTableStats` is not the case. Got it, I'll revert the changes in

Re: [PR] [SPARK-45974][SQL] Add scan.filterAttributes non-empty judgment for RowLevelOperationRuntimeGroupFiltering [spark]

2023-11-26 Thread via GitHub
cloud-fan commented on PR #43869: URL: https://github.com/apache/spark/pull/43869#issuecomment-1826813645 Thanks, merging to master/3.5! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] [SPARK-45974][SQL] Add scan.filterAttributes non-empty judgment for RowLevelOperationRuntimeGroupFiltering [spark]

2023-11-26 Thread via GitHub
cloud-fan closed pull request #43869: [SPARK-45974][SQL] Add scan.filterAttributes non-empty judgment for RowLevelOperationRuntimeGroupFiltering URL: https://github.com/apache/spark/pull/43869 -- This is an automated message from the Apache Git Service. To respond to the message, please log

[PR] [WIP][SQL] Restrict charsets in `encode()` [spark]

2023-11-26 Thread via GitHub
MaxGekk opened a new pull request, #44020: URL: https://github.com/apache/spark/pull/44020 ### What changes were proposed in this pull request? In the PR, I propose to restrict the supported charsets in the `encode()` functions by the list from [the doc](https://spark.apache.org/docs/lat

Re: [PR] [SPARK-46074][CONNECT][SCALA] Insufficient details in error message on UDF failure [spark]

2023-11-26 Thread via GitHub
nija-at commented on PR #43983: URL: https://github.com/apache/spark/pull/43983#issuecomment-1826877511 @HyukjinKwon - I've fixed all the failing tests. Can you take another look and merge if happy? -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] [SPARK-42655][SQL] Incorrect ambiguous column reference error [spark]

2023-11-26 Thread via GitHub
bsikander commented on PR #40258: URL: https://github.com/apache/spark/pull/40258#issuecomment-1826881477 @shrprasa do you think this issue is similar to the issue that i just posted: https://stackoverflow.com/questions/77553257/select-behavior-different-between-pyspark-2-4-8-and-3-3-2

Re: [PR] [SPARK-46074][CONNECT][SCALA] Insufficient details in error message on UDF failure [spark]

2023-11-26 Thread via GitHub
HyukjinKwon commented on PR #43983: URL: https://github.com/apache/spark/pull/43983#issuecomment-1826942313 Merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-46074][CONNECT][SCALA] Insufficient details in error message on UDF failure [spark]

2023-11-26 Thread via GitHub
HyukjinKwon closed pull request #43983: [SPARK-46074][CONNECT][SCALA] Insufficient details in error message on UDF failure URL: https://github.com/apache/spark/pull/43983 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] [SPARK-46103][PYTHON][INFRA][BUILD][DOCS] Enhancing PySpark documentation [spark]

2023-11-26 Thread via GitHub
HyukjinKwon commented on PR #44012: URL: https://github.com/apache/spark/pull/44012#issuecomment-1826942796 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-46103][PYTHON][INFRA][BUILD][DOCS] Enhancing PySpark documentation [spark]

2023-11-26 Thread via GitHub
HyukjinKwon closed pull request #44012: [SPARK-46103][PYTHON][INFRA][BUILD][DOCS] Enhancing PySpark documentation URL: https://github.com/apache/spark/pull/44012 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] [WIP][SQL] Restrict charsets in `encode()` [spark]

2023-11-26 Thread via GitHub
HyukjinKwon commented on code in PR #44020: URL: https://github.com/apache/spark/pull/44020#discussion_r1405513202 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -4584,6 +4584,15 @@ object SQLConf { .checkValue(_ > 0, "The number of stack

[PR] [SPARK-46094][CONNECT] Add support for code profiling executors [spark]

2023-11-26 Thread via GitHub
parthchandra opened a new pull request, #44021: URL: https://github.com/apache/spark/pull/44021 ### What changes were proposed in this pull request? This adds support for the async profiler to Spark ### Why are the changes needed? Profiling of JVM applications on a cluste

Re: [PR] [SPARK-39769][SQL][3.3.0] Correcting the misspelling Unevaluable -> Inevaluable [spark]

2023-11-26 Thread via GitHub
HyukjinKwon commented on code in PR #44017: URL: https://github.com/apache/spark/pull/44017#discussion_r1405513433 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TimeTravelSpec.scala: ## @@ -42,7 +42,7 @@ object TimeTravelSpec { } val tsToEva

Re: [PR] [SPARK-46084][PS][FOLLOWUP] More refactoring by using `create_map` [spark]

2023-11-26 Thread via GitHub
HyukjinKwon commented on PR #44015: URL: https://github.com/apache/spark/pull/44015#issuecomment-1826945445 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-46084][PS][FOLLOWUP] More refactoring by using `create_map` [spark]

2023-11-26 Thread via GitHub
HyukjinKwon closed pull request #44015: [SPARK-46084][PS][FOLLOWUP] More refactoring by using `create_map` URL: https://github.com/apache/spark/pull/44015 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] [SPARK-46099][PS][DOCS] Refactor "Supported pandas API" generation script [spark]

2023-11-26 Thread via GitHub
HyukjinKwon commented on PR #44010: URL: https://github.com/apache/spark/pull/44010#issuecomment-1826945706 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-46099][PS][DOCS] Refactor "Supported pandas API" generation script [spark]

2023-11-26 Thread via GitHub
HyukjinKwon closed pull request #44010: [SPARK-46099][PS][DOCS] Refactor "Supported pandas API" generation script URL: https://github.com/apache/spark/pull/44010 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] [SPARK-46092][SQL] Don't push down Parquet row group filters that overflow [spark]

2023-11-26 Thread via GitHub
HyukjinKwon commented on code in PR #44006: URL: https://github.com/apache/spark/pull/44006#discussion_r1405515061 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala: ## @@ -613,7 +613,10 @@ class ParquetFilters( value == null

Re: [PR] [SPARK-46092][SQL] Don't push down Parquet row group filters that overflow [spark]

2023-11-26 Thread via GitHub
HyukjinKwon commented on code in PR #44006: URL: https://github.com/apache/spark/pull/44006#discussion_r1405515464 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala: ## @@ -613,7 +613,10 @@ class ParquetFilters( value == null

[PR] [MAJOR][SQL] XML data source keepInnerXmlAsRaw option [spark]

2023-11-26 Thread via GitHub
ufuksungu opened a new pull request, #44022: URL: https://github.com/apache/spark/pull/44022 ### What changes were proposed in this pull request? Built-in XML data source gives related value and schema of the inner or nested elements. However, additional operations should be made by d

Re: [PR] [SPARK-44850][CONNECT] Heartbeat in scala's Spark Connect [spark]

2023-11-26 Thread via GitHub
github-actions[bot] closed pull request #42538: [SPARK-44850][CONNECT] Heartbeat in scala's Spark Connect URL: https://github.com/apache/spark/pull/42538 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-44830][CONNECT] Upgrade grpc to 1.57.2 [spark]

2023-11-26 Thread via GitHub
github-actions[bot] commented on PR #42514: URL: https://github.com/apache/spark/pull/42514#issuecomment-1826954843 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-44209] Expose amount of shuffle data available on the node [spark]

2023-11-26 Thread via GitHub
github-actions[bot] closed pull request #42071: [SPARK-44209] Expose amount of shuffle data available on the node URL: https://github.com/apache/spark/pull/42071 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] [MAJOR][SQL] XML data source keepInnerXmlAsRaw option [spark]

2023-11-26 Thread via GitHub
ufuksungu commented on code in PR #44022: URL: https://github.com/apache/spark/pull/44022#discussion_r1405521084 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/xml/XmlSuite.scala: ## @@ -2115,4 +2218,1557 @@ class XmlSuite extends QueryTest with SharedSpa

[PR] [SPARK-46107][PYTHON][ML] Deprecate pyspark.keyword_only API [spark]

2023-11-26 Thread via GitHub
HyukjinKwon opened a new pull request, #44023: URL: https://github.com/apache/spark/pull/44023 ### What changes were proposed in this pull request? This PR deprecates `pyspark.keyword_only` API, remove the usage in our codebase. Note that this PR also removes the docstring that

Re: [PR] [SPARK-46107][PYTHON][ML] Deprecate pyspark.keyword_only API [spark]

2023-11-26 Thread via GitHub
HyukjinKwon commented on PR #44023: URL: https://github.com/apache/spark/pull/44023#issuecomment-1826980150 cc @mengxr @WeichenXu123 FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] [MAJOR][SQL] keepInnerXmlAsRaw option for Built-in XML Data Source [spark]

2023-11-26 Thread via GitHub
HyukjinKwon commented on PR #44022: URL: https://github.com/apache/spark/pull/44022#issuecomment-1826981062 This isn't a major .. let's file a JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] [SPARK-46094][CONNECT] Add support for code profiling executors [spark]

2023-11-26 Thread via GitHub
HyukjinKwon commented on PR #44021: URL: https://github.com/apache/spark/pull/44021#issuecomment-1826981460 how do you use it? would be great if it contains the example, how to run, etc. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] [MAJOR][SQL] keepInnerXmlAsRaw option for Built-in XML Data Source [spark]

2023-11-26 Thread via GitHub
ufuksungu commented on PR #44022: URL: https://github.com/apache/spark/pull/44022#issuecomment-1826991330 @HyukjinKwon I've created the subtask (https://issues.apache.org/jira/browse/SPARK-46108) under "Built-in XML data source support (SPARK-44265)", named as "XML: keepInnerXmlAsRaw op

Re: [PR] [SPARK-46069][SQL] Support unwrap timestamp type to date type [spark]

2023-11-26 Thread via GitHub
wankunde commented on code in PR #43982: URL: https://github.com/apache/spark/pull/43982#discussion_r1405537575 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala: ## @@ -329,6 +334,40 @@ object UnwrapCastInBinaryComparison

Re: [PR] [SPARK-46069][SQL] Support unwrap timestamp type to date type [spark]

2023-11-26 Thread via GitHub
wankunde commented on code in PR #43982: URL: https://github.com/apache/spark/pull/43982#discussion_r1405537995 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala: ## @@ -329,6 +334,40 @@ object UnwrapCastInBinaryComparison

Re: [PR] [SPARK-46094][CONNECT] Add support for code profiling executors [spark]

2023-11-26 Thread via GitHub
parthchandra commented on PR #44021: URL: https://github.com/apache/spark/pull/44021#issuecomment-1827003193 > how do you use it? would be great if it contains the example, how to run, etc. There's a README - connector/profiler/README.md. I can add more details if you think this is n

Re: [PR] [SPARK-39769][SQL][3.3.0] Correcting the misspelling Unevaluable -> Inevaluable [spark]

2023-11-26 Thread via GitHub
yaooqinn commented on code in PR #44017: URL: https://github.com/apache/spark/pull/44017#discussion_r1405541880 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TimeTravelSpec.scala: ## @@ -42,7 +42,7 @@ object TimeTravelSpec { } val tsToEval =

Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]

2023-11-26 Thread via GitHub
beliefer commented on code in PR #43958: URL: https://github.com/apache/spark/pull/43958#discussion_r1405543679 ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala: ## @@ -196,24 +216,15 @@ class FrameLessOffsetWindowFunctionFrame( off

Re: [PR] [SPARK-46096][BUILD] Upgrade `sbt` to 1.9.7 [spark]

2023-11-26 Thread via GitHub
panbingkun commented on PR #44008: URL: https://github.com/apache/spark/pull/44008#issuecomment-1827018413 According to my investigation, some of the dependencies of `SBT` itself, when we set `Maven repo` as the default `local dependency`, if some jar files are corrupted, it will not automa

Re: [PR] [SPARK-46096][BUILD] Upgrade `sbt` to 1.9.7 [spark]

2023-11-26 Thread via GitHub
panbingkun commented on PR #44008: URL: https://github.com/apache/spark/pull/44008#issuecomment-1827023500 Several options that can be workaround 1. Proactively use a script with the Maven command to download some dependency jars, eg: https://github.com/apache/spark/pull/43079/files#d

Re: [PR] [SPARK-46090][SQL] Support plan fragment level SQL configs in AQE [spark]

2023-11-26 Thread via GitHub
ulysses-you commented on PR #44013: URL: https://github.com/apache/spark/pull/44013#issuecomment-1827025668 cc @cloud-fan @maryannxue @dongjoon-hyun if you have time to take a look at this idea, thank you. -- This is an automated message from the Apache Git Service. To respond to the mess

Re: [PR] [SPARK-46107][PYTHON][ML] Deprecate pyspark.keyword_only API [spark]

2023-11-26 Thread via GitHub
WeichenXu123 commented on code in PR #44023: URL: https://github.com/apache/spark/pull/44023#discussion_r1405554054 ## python/pyspark/ml/classification.py: ## @@ -1275,7 +1263,6 @@ def __init__( ): ... -@keyword_only Review Comment: I don't think so ? It

[PR] [SPARK-46110][PYTHON] Use error classes in catalog, conf, connect, observation, pandas modules [spark]

2023-11-26 Thread via GitHub
HyukjinKwon opened a new pull request, #44024: URL: https://github.com/apache/spark/pull/44024 ### What changes were proposed in this pull request? This PR proposes to use error classes in catalog, conf, connect, observation, pandas modules. ### Why are the changes needed?

Re: [PR] [SPARK-46110][PYTHON] Use error classes in catalog, conf, connect, observation, pandas modules [spark]

2023-11-26 Thread via GitHub
HyukjinKwon commented on PR #44024: URL: https://github.com/apache/spark/pull/44024#issuecomment-1827029668 cc @itholic -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [SPARK-39769][SQL][3.3.0] Correcting the misspelling Unevaluable -> Inevaluable [spark]

2023-11-26 Thread via GitHub
luogaiyu closed pull request #44017: [SPARK-39769][SQL][3.3.0] Correcting the misspelling Unevaluable -> Inevaluable URL: https://github.com/apache/spark/pull/44017 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-46107][PYTHON][ML] Deprecate pyspark.keyword_only API [spark]

2023-11-26 Thread via GitHub
HyukjinKwon commented on code in PR #44023: URL: https://github.com/apache/spark/pull/44023#discussion_r1405556038 ## python/pyspark/ml/classification.py: ## @@ -1275,7 +1263,6 @@ def __init__( ): ... -@keyword_only Review Comment: We should manually stor

[PR] [SPARK-45888][SS] Apply error class framework to State (Metadata) Data Source [spark]

2023-11-26 Thread via GitHub
HeartSaVioR opened a new pull request, #44025: URL: https://github.com/apache/spark/pull/44025 ### What changes were proposed in this pull request? This PR proposes to apply error class framework to the new data source, State (Metadata) Data Source. ### Why are the changes need

Re: [PR] [SPARK-45699][BUILD][CORE][SQL][SS][CONNECT][MLLIB][ML][DSTREAM][GRAPHX][K8S][UI] Fixing all compilation warnings related to widening conversions [spark]

2023-11-26 Thread via GitHub
hannahkamundson commented on PR #43890: URL: https://github.com/apache/spark/pull/43890#issuecomment-1827033765 @LuciferYang Updated! How do I merge it in? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-45699][BUILD][CORE][SQL][SS][CONNECT][MLLIB][ML][DSTREAM][GRAPHX][K8S][UI] Fixing all compilation warnings related to widening conversions [spark]

2023-11-26 Thread via GitHub
LuciferYang closed pull request #43890: [SPARK-45699][BUILD][CORE][SQL][SS][CONNECT][MLLIB][ML][DSTREAM][GRAPHX][K8S][UI] Fixing all compilation warnings related to widening conversions URL: https://github.com/apache/spark/pull/43890 -- This is an automated message from the Apache Git Servic

Re: [PR] [SPARK-45699][BUILD][CORE][SQL][SS][CONNECT][MLLIB][ML][DSTREAM][GRAPHX][K8S][UI] Fixing all compilation warnings related to widening conversions [spark]

2023-11-26 Thread via GitHub
LuciferYang commented on PR #43890: URL: https://github.com/apache/spark/pull/43890#issuecomment-1827038608 Merged into master for Spark 4.0. Thanks @hannahkamundson and @tgravescs ~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] [SPARK-46034][CORE] SparkContext add file should also copy file to local root path [spark]

2023-11-26 Thread via GitHub
AngersZh commented on PR #43936: URL: https://github.com/apache/spark/pull/43936#issuecomment-1827039250 gentle ping @yaooqinn @cloud-fan @HyukjinKwon @tgravescs Could you take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log o

[PR] [SPARK-46111][DOCS][PYTHON] Add copyright to the PySpark official documentation. [spark]

2023-11-26 Thread via GitHub
itholic opened a new pull request, #44026: URL: https://github.com/apache/spark/pull/44026 ### What changes were proposed in this pull request? This PR proposes to add the Apache Spark Foundation copyright notice to the bottom of the PySpark official documentation. ### Why

Re: [PR] [SPARK-46034][CORE] SparkContext add file should also copy file to local root path [spark]

2023-11-26 Thread via GitHub
yaooqinn commented on PR #43936: URL: https://github.com/apache/spark/pull/43936#issuecomment-1827051972 Could you please update the PR description? It looks overdue and inaccurate for me to catch up with. Could you also provide the output for `LIST FILE`? I guess we shall use its ou

Re: [PR] [SPARK-46107][PYTHON][ML] Deprecate pyspark.keyword_only API [spark]

2023-11-26 Thread via GitHub
HyukjinKwon commented on PR #44023: URL: https://github.com/apache/spark/pull/44023#issuecomment-1827067123 cc @zero323 too -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-46111][DOCS][PYTHON] Add copyright to the PySpark official documentation. [spark]

2023-11-26 Thread via GitHub
itholic commented on code in PR #44026: URL: https://github.com/apache/spark/pull/44026#discussion_r1405579338 ## python/docs/source/_templates/spark_footer.html: ## @@ -0,0 +1,3 @@ + +{{copyright}} The Apache Software Foundation, Licensed under the https://www.apache.org/li

Re: [PR] [SPARK-45833][SS][DOCS] Document the new introduction of state data source [spark]

2023-11-26 Thread via GitHub
anishshri-db commented on code in PR #43920: URL: https://github.com/apache/spark/pull/43920#discussion_r1405581629 ## docs/structured-streaming-state-data-source.md: ## @@ -0,0 +1,248 @@ +--- Review Comment: Could we create a streaming specific sub-directory under `docs` ?

Re: [PR] [SPARK-46111][DOCS][PYTHON] Add copyright to the PySpark official documentation. [spark]

2023-11-26 Thread via GitHub
HyukjinKwon commented on PR #44026: URL: https://github.com/apache/spark/pull/44026#issuecomment-1827072792 cc @srowen FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] [SPARK-46111][DOCS][PYTHON] Add copyright to the PySpark official documentation. [spark]

2023-11-26 Thread via GitHub
srowen commented on code in PR #44026: URL: https://github.com/apache/spark/pull/44026#discussion_r1405583515 ## python/docs/source/conf.py: ## @@ -124,7 +125,8 @@ # General information about the project. project = 'PySpark' -copyright = '' +# We have our custom "spark_foote

Re: [PR] [SPARK-46006][YARN] YarnAllocator miss clean targetNumExecutorsPerResourceProfileId after YarnSchedulerBackend call stop [spark]

2023-11-26 Thread via GitHub
zhouyifan279 commented on code in PR #43906: URL: https://github.com/apache/spark/pull/43906#discussion_r1405584026 ## resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala: ## @@ -384,19 +384,25 @@ private[yarn] class YarnAllocator( this.nu

Re: [PR] [SPARK-42746][SQL] Add the LISTAGG() aggregate function [spark]

2023-11-26 Thread via GitHub
Hisoka-X commented on code in PR #42398: URL: https://github.com/apache/spark/pull/42398#discussion_r1405586741 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala: ## @@ -2214,6 +2214,30 @@ class AstBuilder extends DataTypeAstBuilder with SQLCo

Re: [PR] [SPARK-46111][DOCS][PYTHON] Add copyright to the PySpark official documentation. [spark]

2023-11-26 Thread via GitHub
itholic commented on code in PR #44026: URL: https://github.com/apache/spark/pull/44026#discussion_r1405586947 ## python/docs/source/conf.py: ## @@ -124,7 +125,8 @@ # General information about the project. project = 'PySpark' -copyright = '' +# We have our custom "spark_foot

[PR] [SPARK-46113][BUILD][DOCS] Update `pydata_sphinx_theme` version requirement to >=0.13 [spark]

2023-11-26 Thread via GitHub
itholic opened a new pull request, #44027: URL: https://github.com/apache/spark/pull/44027 ### What changes were proposed in this pull request? This PR proposes to update the version requirement for `pydata_sphinx_theme` in `requirements.txt` from `==0.13` to `>=0.13`. This change all

Re: [PR] [SPARK-46006][YARN] YarnAllocator miss clean targetNumExecutorsPerResourceProfileId after YarnSchedulerBackend call stop [spark]

2023-11-26 Thread via GitHub
AngersZh commented on code in PR #43906: URL: https://github.com/apache/spark/pull/43906#discussion_r1405592624 ## resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala: ## @@ -384,19 +384,25 @@ private[yarn] class YarnAllocator( this.nu

[PR] [SPARK-46114][PYTHON] Add PySparkIndexError for error framework [spark]

2023-11-26 Thread via GitHub
HyukjinKwon opened a new pull request, #44028: URL: https://github.com/apache/spark/pull/44028 ### What changes were proposed in this pull request? This PR proposes to add `PySparkIndexError` for error framework to replace `IndexError` in `pyspark.sql.*`. ### Why are the change

Re: [PR] [SPARK-46114][PYTHON] Add PySparkIndexError for error framework [spark]

2023-11-26 Thread via GitHub
HyukjinKwon commented on PR #44028: URL: https://github.com/apache/spark/pull/44028#issuecomment-1827097231 cc @itholic and @zhengruifeng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] [SPARK-45888][SS] Apply error class framework to State (Metadata) Data Source [spark]

2023-11-26 Thread via GitHub
HeartSaVioR commented on PR #44025: URL: https://github.com/apache/spark/pull/44025#issuecomment-1827098660 cc. @MaxGekk Would you mind taking a look? Thanks in advance! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[PR] [MINOR][PYTHON] Remove _inferSchema in SQLContext [spark]

2023-11-26 Thread via GitHub
HyukjinKwon opened a new pull request, #44031: URL: https://github.com/apache/spark/pull/44031 ### What changes were proposed in this pull request? There are only two places that use `SQLContext_inferSchema` that can be safely converted to `SQLContext.sparkSession._inferSchema` instea

Re: [PR] [SPARK-45556][UI] Allow web page respond customized status code and message through WebApplicationException [spark]

2023-11-26 Thread via GitHub
kuwii commented on code in PR #43646: URL: https://github.com/apache/spark/pull/43646#discussion_r1404701377 ## core/src/test/scala/org/apache/spark/deploy/history/HistoryServerSuite.scala: ## @@ -93,16 +95,23 @@ abstract class HistoryServerSuite extends SparkFunSuite with Befo

Re: [PR] [SPARK-46111][DOCS][PYTHON] Add copyright to the PySpark official documentation. [spark]

2023-11-26 Thread via GitHub
srowen commented on code in PR #44026: URL: https://github.com/apache/spark/pull/44026#discussion_r1405611472 ## python/docs/source/conf.py: ## @@ -124,7 +125,8 @@ # General information about the project. project = 'PySpark' -copyright = '' +# We have our custom "spark_foote

Re: [PR] [SPARK-45833][SS][DOCS] Document the new introduction of state data source [spark]

2023-11-26 Thread via GitHub
HeartSaVioR commented on code in PR #43920: URL: https://github.com/apache/spark/pull/43920#discussion_r1405617170 ## docs/structured-streaming-state-data-source.md: ## @@ -0,0 +1,248 @@ +--- Review Comment: It may need broader change as there are links for images/js/css. I

Re: [PR] [SPARK-46111][DOCS][PYTHON] Add copyright to the PySpark official documentation. [spark]

2023-11-26 Thread via GitHub
itholic commented on code in PR #44026: URL: https://github.com/apache/spark/pull/44026#discussion_r1405621098 ## python/docs/source/conf.py: ## @@ -124,7 +125,8 @@ # General information about the project. project = 'PySpark' -copyright = '' +# We have our custom "spark_foot

[PR] [SPARK-46103][FOLLOWUP] Keep Sphinx version consistency in spark-rm [spark]

2023-11-26 Thread via GitHub
panbingkun opened a new pull request, #44032: URL: https://github.com/apache/spark/pull/44032 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-46103][FOLLOWUP] Keep Sphinx version consistency in spark-rm [spark]

2023-11-26 Thread via GitHub
panbingkun commented on PR #44032: URL: https://github.com/apache/spark/pull/44032#issuecomment-1827159347 cc @HyukjinKwon @itholic -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[PR] [SPARK-46116][DOCS][PYTHON] Adding "Q&A Support" and "Mailing Lists" link into PySpark doc homepage. [spark]

2023-11-26 Thread via GitHub
itholic opened a new pull request, #44033: URL: https://github.com/apache/spark/pull/44033 ### What changes were proposed in this pull request? This PR proposes to enhance the PySpark documentation by adding more items for a "Useful links"including "Q&A Support", "Dev Mailing List" an

Re: [PR] [SPARK-46103][FOLLOWUP] Keep Sphinx version consistency in spark-rm [spark]

2023-11-26 Thread via GitHub
itholic commented on PR #44032: URL: https://github.com/apache/spark/pull/44032#issuecomment-1827176844 Also cc @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] [SPARK-46069][SQL] Support unwrap timestamp type to date type [spark]

2023-11-26 Thread via GitHub
wangyum commented on code in PR #43982: URL: https://github.com/apache/spark/pull/43982#discussion_r1405654405 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala: ## @@ -329,6 +334,40 @@ object UnwrapCastInBinaryComparison e

Re: [PR] [SPARK-45888][SS] Apply error class framework to State (Metadata) Data Source [spark]

2023-11-26 Thread via GitHub
HeartSaVioR commented on code in PR #44025: URL: https://github.com/apache/spark/pull/44025#discussion_r1405654619 ## common/utils/src/main/resources/error/error-classes.json: ## @@ -3030,6 +3030,60 @@ ], "sqlState" : "42713" }, + "STDS_COMMITTED_BATCH_UNAVAILABLE"

Re: [PR] [SPARK-46069][SQL] Support unwrap timestamp type to date type [spark]

2023-11-26 Thread via GitHub
wangyum commented on code in PR #43982: URL: https://github.com/apache/spark/pull/43982#discussion_r1405654843 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala: ## @@ -329,6 +334,40 @@ object UnwrapCastInBinaryComparison e

Re: [PR] [SPARK-46103][FOLLOWUP] Keep Sphinx version consistency in spark-rm [spark]

2023-11-26 Thread via GitHub
itholic commented on code in PR #44032: URL: https://github.com/apache/spark/pull/44032#discussion_r1405654759 ## dev/create-release/spark-rm/Dockerfile: ## @@ -42,7 +42,7 @@ ARG APT_INSTALL="apt-get install --no-install-recommends -y" # We should use the latest Sphinx versio

Re: [PR] [SPARK-46103][FOLLOWUP] Keep Sphinx version consistency in spark-rm [spark]

2023-11-26 Thread via GitHub
itholic commented on code in PR #44032: URL: https://github.com/apache/spark/pull/44032#discussion_r1405662965 ## dev/create-release/spark-rm/Dockerfile: ## @@ -42,7 +42,7 @@ ARG APT_INSTALL="apt-get install --no-install-recommends -y" # We should use the latest Sphinx versio

Re: [PR] [SPARK-46103][FOLLOWUP] Keep Sphinx version consistency in spark-rm [spark]

2023-11-26 Thread via GitHub
panbingkun commented on code in PR #44032: URL: https://github.com/apache/spark/pull/44032#discussion_r1405664960 ## dev/create-release/spark-rm/Dockerfile: ## @@ -42,7 +42,7 @@ ARG APT_INSTALL="apt-get install --no-install-recommends -y" # We should use the latest Sphinx ver

Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]

2023-11-26 Thread via GitHub
beliefer commented on code in PR #43958: URL: https://github.com/apache/spark/pull/43958#discussion_r1405543679 ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala: ## @@ -196,24 +216,15 @@ class FrameLessOffsetWindowFunctionFrame( off

Re: [PR] [SPARK-46114][PYTHON] Add PySparkIndexError for error framework [spark]

2023-11-26 Thread via GitHub
HyukjinKwon commented on PR #44028: URL: https://github.com/apache/spark/pull/44028#issuecomment-1827193151 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-46114][PYTHON] Add PySparkIndexError for error framework [spark]

2023-11-26 Thread via GitHub
HyukjinKwon closed pull request #44028: [SPARK-46114][PYTHON] Add PySparkIndexError for error framework URL: https://github.com/apache/spark/pull/44028 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] [MINOR][PYTHON] Remove obsolete TODOs for ignores at SQLContext and HiveContext [spark]

2023-11-26 Thread via GitHub
HyukjinKwon commented on PR #44030: URL: https://github.com/apache/spark/pull/44030#issuecomment-1827193855 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-45760][SQL][FOLLOWUP] Inline With inside conditional branches [spark]

2023-11-26 Thread via GitHub
viirya commented on code in PR #43978: URL: https://github.com/apache/spark/pull/43978#discussion_r1405671315 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteWithExpression.scala: ## @@ -35,56 +35,82 @@ object RewriteWithExpression extends Rule[Logi

Re: [PR] [MINOR][PYTHON] Remove obsolete TODOs for ignores at SQLContext and HiveContext [spark]

2023-11-26 Thread via GitHub
HyukjinKwon closed pull request #44030: [MINOR][PYTHON] Remove obsolete TODOs for ignores at SQLContext and HiveContext URL: https://github.com/apache/spark/pull/44030 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

  1   2   >