[GitHub] [spark] Hisoka-X commented on a diff in pull request #42744: [SPARK-44990][SQL][FOLLOWUP] Add benchmark for write null value to csv

2023-08-30 Thread via GitHub
Hisoka-X commented on code in PR #42744: URL: https://github.com/apache/spark/pull/42744#discussion_r1311120674 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVBenchmark.scala: ## @@ -72,7 +72,6 @@ object CSVBenchmark extends SqlBasedBenchmark {

[GitHub] [spark] sadikovi commented on a diff in pull request #42667: [SPARK-44940][SQL] Improve performance of JSON parsing when "spark.sql.json.enablePartialResults" is enabled

2023-08-30 Thread via GitHub
sadikovi commented on code in PR #42667: URL: https://github.com/apache/spark/pull/42667#discussion_r1311087439 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala: ## @@ -454,7 +454,20 @@ class JacksonParser(

[GitHub] [spark] zhengruifeng commented on pull request #42741: [SPARK-45024][PYTHON][CONNECT] Filter out some configurations in Session Creation

2023-08-30 Thread via GitHub
zhengruifeng commented on PR #42741: URL: https://github.com/apache/spark/pull/42741#issuecomment-1700354649 thanks, all tests passed, merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] zhengruifeng closed pull request #42741: [SPARK-45024][PYTHON][CONNECT] Filter out some configurations in Session Creation

2023-08-30 Thread via GitHub
zhengruifeng closed pull request #42741: [SPARK-45024][PYTHON][CONNECT] Filter out some configurations in Session Creation URL: https://github.com/apache/spark/pull/42741 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42744: [SPARK-44990][SQL][FOLLOWUP] Add benchmark for write null value to csv

2023-08-30 Thread via GitHub
HyukjinKwon commented on code in PR #42744: URL: https://github.com/apache/spark/pull/42744#discussion_r1311078655 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityGenerator.scala: ## @@ -60,7 +60,7 @@ class UnivocityGenerator( options.locale,

[GitHub] [spark] Hisoka-X commented on pull request #42744: [SPARK-44990][SQL][FOLLOWUP] Add benchmark for write null value to csv

2023-08-30 Thread via GitHub
Hisoka-X commented on PR #42744: URL: https://github.com/apache/spark/pull/42744#issuecomment-1700351196 Also I do some compare on my laptop. The #42738 worked. ![image](https://github.com/apache/spark/assets/32387433/1bb42240-0c70-4b16-bf12-8f8e5c8162c2)

[GitHub] [spark] Hisoka-X commented on pull request #42744: [SPARK-44990][SQL][FOLLOWUP] Add benchmark for write null value to csv

2023-08-30 Thread via GitHub
Hisoka-X commented on PR #42744: URL: https://github.com/apache/spark/pull/42744#issuecomment-1700348468 cc @dongjoon-hyun @MaxGekk @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] Hisoka-X opened a new pull request, #42744: [SPARK-44990][SQL][FOLLOWUP] Add benchmark for write null value to csv

2023-08-30 Thread via GitHub
Hisoka-X opened a new pull request, #42744: URL: https://github.com/apache/spark/pull/42744 ### What changes were proposed in this pull request? This is a follow up PR of #42738 . Add benchmark for write null to csv case. Also solved

[GitHub] [spark] zhengruifeng commented on pull request #42741: [SPARK-45024][PYTHON][CONNECT] Filter out some configurations in Session Creation

2023-08-30 Thread via GitHub
zhengruifeng commented on PR #42741: URL: https://github.com/apache/spark/pull/42741#issuecomment-1700343827 cc @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] sadikovi commented on a diff in pull request #42667: [SPARK-44940][SQL] Improve performance of JSON parsing when "spark.sql.json.enablePartialResults" is enabled

2023-08-30 Thread via GitHub
sadikovi commented on code in PR #42667: URL: https://github.com/apache/spark/pull/42667#discussion_r1311066933 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala: ## @@ -454,7 +454,20 @@ class JacksonParser(

[GitHub] [spark] zhengruifeng commented on pull request #42743: [SPARK-45018][PYTHON][CONNECT] Add CalendarIntervalType to Python Client

2023-08-30 Thread via GitHub
zhengruifeng commented on PR #42743: URL: https://github.com/apache/spark/pull/42743#issuecomment-1700323374 CI link: https://github.com/zhengruifeng/spark/actions/runs/6032590329 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] zhengruifeng commented on pull request #42741: [SPARK-45024][PYTHON][CONNECT] Filter out some configurations in Session Creation

2023-08-30 Thread via GitHub
zhengruifeng commented on PR #42741: URL: https://github.com/apache/spark/pull/42741#issuecomment-1700312685 CI link: https://github.com/zhengruifeng/spark/actions/runs/6032019013 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42731: [SPARK-45014][CONNECT] Clean up fileserver when cleaning up files, jars and archives in SparkContext

2023-08-30 Thread via GitHub
HyukjinKwon commented on code in PR #42731: URL: https://github.com/apache/spark/pull/42731#discussion_r1311046817 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/artifact/SparkConnectArtifactManager.scala: ## @@ -208,12 +209,38 @@ class

[GitHub] [spark] zhengruifeng commented on a diff in pull request #42743: [SPARK-45018][PYTHON][CONNECT] Add CalendarIntervalType to Python Client

2023-08-30 Thread via GitHub
zhengruifeng commented on code in PR #42743: URL: https://github.com/apache/spark/pull/42743#discussion_r1311046624 ## python/pyspark/sql/tests/connect/test_parity_types.py: ## @@ -86,7 +86,7 @@ def test_rdd_with_udt(self): def test_udt(self): super().test_udt()

[GitHub] [spark] zhengruifeng opened a new pull request, #42743: [SPARK-45018][PYTHON][CONNECT] Add CalendarIntervalType to Python Client

2023-08-30 Thread via GitHub
zhengruifeng opened a new pull request, #42743: URL: https://github.com/apache/spark/pull/42743 ### What changes were proposed in this pull request? Add CalendarIntervalType to Python Client ### Why are the changes needed? for feature parity ### Does this PR

[GitHub] [spark] cloud-fan commented on a diff in pull request #42667: [SPARK-44940][SQL] Improve performance of JSON parsing when "spark.sql.json.enablePartialResults" is enabled

2023-08-30 Thread via GitHub
cloud-fan commented on code in PR #42667: URL: https://github.com/apache/spark/pull/42667#discussion_r1311038797 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala: ## @@ -454,7 +454,20 @@ class JacksonParser(

[GitHub] [spark] Hisoka-X commented on a diff in pull request #42738: [SPARK-44990][SQL] Reduce the frequency of get `spark.sql.legacy.nullValueWrittenAsQuotedEmptyStringCsv`

2023-08-30 Thread via GitHub
Hisoka-X commented on code in PR #42738: URL: https://github.com/apache/spark/pull/42738#discussion_r1311037409 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityGenerator.scala: ## @@ -60,6 +60,8 @@ class UnivocityGenerator( options.locale,

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42731: [SPARK-45014][CONNECT] Clean up fileserver when cleaning up files, jars and archives in SparkContext

2023-08-30 Thread via GitHub
HyukjinKwon commented on code in PR #42731: URL: https://github.com/apache/spark/pull/42731#discussion_r1311036686 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/artifact/SparkConnectArtifactManager.scala: ## @@ -208,12 +209,38 @@ class

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42731: [SPARK-45014][CONNECT] Clean up fileserver when cleaning up files, jars and archives in SparkContext

2023-08-30 Thread via GitHub
HyukjinKwon commented on code in PR #42731: URL: https://github.com/apache/spark/pull/42731#discussion_r1311036686 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/artifact/SparkConnectArtifactManager.scala: ## @@ -208,12 +209,38 @@ class

[GitHub] [spark] cloud-fan commented on a diff in pull request #42738: [SPARK-44990][SQL] Reduce the frequency of get `spark.sql.legacy.nullValueWrittenAsQuotedEmptyStringCsv`

2023-08-30 Thread via GitHub
cloud-fan commented on code in PR #42738: URL: https://github.com/apache/spark/pull/42738#discussion_r1311036309 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityGenerator.scala: ## @@ -60,6 +60,8 @@ class UnivocityGenerator( options.locale,

[GitHub] [spark] cloud-fan closed pull request #42729: [SPARK-45012][SQL] CheckAnalysis should throw inlined plan in AnalysisException

2023-08-30 Thread via GitHub
cloud-fan closed pull request #42729: [SPARK-45012][SQL] CheckAnalysis should throw inlined plan in AnalysisException URL: https://github.com/apache/spark/pull/42729 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] cloud-fan commented on pull request #42729: [SPARK-45012][SQL] CheckAnalysis should throw inlined plan in AnalysisException

2023-08-30 Thread via GitHub
cloud-fan commented on PR #42729: URL: https://github.com/apache/spark/pull/42729#issuecomment-1700294888 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] anishshri-db commented on pull request #42742: [SPARK-45025] Allow block manager memory store iterator to handle thread interrupt and perform task completion gracefully

2023-08-30 Thread via GitHub
anishshri-db commented on PR #42742: URL: https://github.com/apache/spark/pull/42742#issuecomment-1700289385 cc - @JoshRosen , @HeartSaVioR - PTAL, thx ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] anishshri-db opened a new pull request, #42742: [SPARK-45025] Allow block manager memory store iterator to handle thread interrupt and perform task completion gracefully

2023-08-30 Thread via GitHub
anishshri-db opened a new pull request, #42742: URL: https://github.com/apache/spark/pull/42742 ### What changes were proposed in this pull request? Allow block manager memory store iterator to handle thread interrupt and perform task completion gracefully ### Why are the

[GitHub] [spark] LuciferYang commented on pull request #42739: [SPARK-45021][BUILD] Remove `antlr4-maven-plugin` configuration from `sql/catalyst/pom.xml`

2023-08-30 Thread via GitHub
LuciferYang commented on PR #42739: URL: https://github.com/apache/spark/pull/42739#issuecomment-1700269471 Thanks @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng opened a new pull request, #42741: [SPARK-45024][PYTHON][CONNECT] Filter out some configurations in Session Creation

2023-08-30 Thread via GitHub
zhengruifeng opened a new pull request, #42741: URL: https://github.com/apache/spark/pull/42741 ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/42694 filtered out static configurations in local mode This filter out some configurations in

[GitHub] [spark] zhengruifeng commented on pull request #42735: [SPARK-45015][PYTHON][DOCS] Refine DocStrings of `try_{add, subtract, multiply, divide, avg, sum}`

2023-08-30 Thread via GitHub
zhengruifeng commented on PR #42735: URL: https://github.com/apache/spark/pull/42735#issuecomment-1700130425 thanks, merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng closed pull request #42735: [SPARK-45015][PYTHON][DOCS] Refine DocStrings of `try_{add, subtract, multiply, divide, avg, sum}`

2023-08-30 Thread via GitHub
zhengruifeng closed pull request #42735: [SPARK-45015][PYTHON][DOCS] Refine DocStrings of `try_{add, subtract, multiply, divide, avg, sum}` URL: https://github.com/apache/spark/pull/42735 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] amaliujia commented on pull request #42729: [SPARK-45012][SQL] CheckAnalysis should throw inlined plan in AnalysisException

2023-08-30 Thread via GitHub
amaliujia commented on PR #42729: URL: https://github.com/apache/spark/pull/42729#issuecomment-1700128091 @dongjoon-hyun Tests fixed and `affected version` updated to `4.0.0` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] panbingkun commented on pull request #42733: [SPARK-45019][BUILD] Make workflow scala213 on container & clean env

2023-08-30 Thread via GitHub
panbingkun commented on PR #42733: URL: https://github.com/apache/spark/pull/42733#issuecomment-1700125161 > Thank you for making a PR, @panbingkun . Does it mean that the developers and users need to clean `~/.m2` always? > > If this is a bug of #42673 , we had better skip SBT 1.9.4

[GitHub] [spark] HyukjinKwon closed pull request #42686: [SPARK-44971][PYTHON] StreamingQueryProgress event fromJson bug fix

2023-08-30 Thread via GitHub
HyukjinKwon closed pull request #42686: [SPARK-44971][PYTHON] StreamingQueryProgress event fromJson bug fix URL: https://github.com/apache/spark/pull/42686 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] WweiL commented on pull request #42686: [SPARK-44971][PYTHON] StreamingQueryProgress event fromJson bug fix

2023-08-30 Thread via GitHub
WweiL commented on PR #42686: URL: https://github.com/apache/spark/pull/42686#issuecomment-1700077614 ah nvm I set the merge target to 3.5, should be fine -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] HyukjinKwon commented on pull request #42686: [SPARK-44971][PYTHON] StreamingQueryProgress event fromJson bug fix

2023-08-30 Thread via GitHub
HyukjinKwon commented on PR #42686: URL: https://github.com/apache/spark/pull/42686#issuecomment-1700079247 Yup, merged to 3.5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] WweiL commented on pull request #42686: [SPARK-44971][PYTHON] StreamingQueryProgress event fromJson bug fix

2023-08-30 Thread via GitHub
WweiL commented on PR #42686: URL: https://github.com/apache/spark/pull/42686#issuecomment-1700076738 @HyukjinKwon Can we just merge this to 3.5 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] HyukjinKwon commented on pull request #42686: [SPARK-44971][PYTHON] StreamingQueryProgress event fromJson bug fix

2023-08-30 Thread via GitHub
HyukjinKwon commented on PR #42686: URL: https://github.com/apache/spark/pull/42686#issuecomment-1700074868 Merged to master and branch-3.5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] zhengruifeng commented on a diff in pull request #42736: [SPARK-45017][PYTHON] Add `CalendarIntervalType` to PySpark

2023-08-30 Thread via GitHub
zhengruifeng commented on code in PR #42736: URL: https://github.com/apache/spark/pull/42736#discussion_r1310971373 ## python/pyspark/sql/tests/connect/test_parity_types.py: ## @@ -86,6 +86,10 @@ def test_rdd_with_udt(self): def test_udt(self): super().test_udt()

[GitHub] [spark] heyihong commented on a diff in pull request #42377: [SPARK-44622][SQL][CONNECT] Implement error enrichment and setting server-side stacktrace

2023-08-30 Thread via GitHub
heyihong commented on code in PR #42377: URL: https://github.com/apache/spark/pull/42377#discussion_r1310971269 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SessionHolder.scala: ## @@ -45,6 +45,9 @@ case class SessionHolder(userId: String,

[GitHub] [spark] zhengruifeng commented on pull request #42736: [SPARK-45017][PYTHON] Add `CalendarIntervalType` to PySpark

2023-08-30 Thread via GitHub
zhengruifeng commented on PR #42736: URL: https://github.com/apache/spark/pull/42736#issuecomment-1700071447 @dongjoon-hyun thank you for reviewing! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] zhengruifeng commented on pull request #42734: [SPARK-45016][PYTHON][CONNECT] Add missing `try_remote_functions` annotations

2023-08-30 Thread via GitHub
zhengruifeng commented on PR #42734: URL: https://github.com/apache/spark/pull/42734#issuecomment-1700064769 thank you @dongjoon-hyun for review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] itholic commented on pull request #40436: [SPARK-42619][PS] Add `show_counts` parameter for DataFrame.info

2023-08-30 Thread via GitHub
itholic commented on PR #40436: URL: https://github.com/apache/spark/pull/40436#issuecomment-1700029755 LGTM. Seems like the CI failure is not related to this change. Could you retrigger the CI? -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] itholic commented on a diff in pull request #40436: [SPARK-42619][PS] Add `show_counts` parameter for DataFrame.info

2023-08-30 Thread via GitHub
itholic commented on code in PR #40436: URL: https://github.com/apache/spark/pull/40436#discussion_r1310962045 ## python/pyspark/pandas/indexes/base.py: ## @@ -289,7 +289,7 @@ def _summary(self, name: Optional[str] = None) -> str: if name is None: name =

[GitHub] [spark] heyihong commented on a diff in pull request #42377: [SPARK-44622][SQL][CONNECT] Implement error enrichment and setting server-side stacktrace

2023-08-30 Thread via GitHub
heyihong commented on code in PR #42377: URL: https://github.com/apache/spark/pull/42377#discussion_r1310956914 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala: ## @@ -44,6 +44,40 @@ import org.apache.spark.sql.types._ class

[GitHub] [spark] Hisoka-X commented on pull request #42738: [SPARK-44990][SQL] Reduce the frequency of get `spark.sql.legacy.nullValueWrittenAsQuotedEmptyStringCsv`

2023-08-30 Thread via GitHub
Hisoka-X commented on PR #42738: URL: https://github.com/apache/spark/pull/42738#issuecomment-161912 Thanks @dongjoon-hyun ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] heyihong commented on a diff in pull request #42377: [SPARK-44622][SQL][CONNECT] Implement error enrichment and setting server-side stacktrace

2023-08-30 Thread via GitHub
heyihong commented on code in PR #42377: URL: https://github.com/apache/spark/pull/42377#discussion_r1310947001 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectGetErrorInfoHandler.scala: ## @@ -0,0 +1,81 @@ +/* + * Licensed to the

[GitHub] [spark] allisonwang-db commented on a diff in pull request #42272: [SPARK-44508][PYTHON][DOCS] Add user guide for Python user-defined table functions

2023-08-30 Thread via GitHub
allisonwang-db commented on code in PR #42272: URL: https://github.com/apache/spark/pull/42272#discussion_r1310942775 ## examples/src/main/python/sql/udtf.py: ## @@ -0,0 +1,230 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] [spark] allisonwang-db commented on a diff in pull request #42272: [SPARK-44508][PYTHON][DOCS] Add user guide for Python user-defined table functions

2023-08-30 Thread via GitHub
allisonwang-db commented on code in PR #42272: URL: https://github.com/apache/spark/pull/42272#discussion_r1310942547 ## python/docs/source/user_guide/sql/python_udtf.rst: ## @@ -0,0 +1,216 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +or more

[GitHub] [spark] heyihong commented on a diff in pull request #42377: [SPARK-44622][SQL][CONNECT] Implement error enrichment and setting server-side stacktrace

2023-08-30 Thread via GitHub
heyihong commented on code in PR #42377: URL: https://github.com/apache/spark/pull/42377#discussion_r1310939456 ## connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/GrpcExceptionConverter.scala: ## @@ -64,6 +69,94 @@ private[client] object

[GitHub] [spark] heyihong commented on a diff in pull request #42377: [SPARK-44622][SQL][CONNECT] Implement error enrichment and setting server-side stacktrace

2023-08-30 Thread via GitHub
heyihong commented on code in PR #42377: URL: https://github.com/apache/spark/pull/42377#discussion_r1310938778 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala: ## @@ -201,6 +201,20 @@ class SparkConnectService(debug:

[GitHub] [spark] heyihong commented on a diff in pull request #42377: [SPARK-44622][SQL][CONNECT] Implement error enrichment and setting server-side stacktrace

2023-08-30 Thread via GitHub
heyihong commented on code in PR #42377: URL: https://github.com/apache/spark/pull/42377#discussion_r1310937715 ## connector/connect/common/src/main/protobuf/spark/connect/base.proto: ## @@ -778,6 +778,62 @@ message ReleaseExecuteResponse { optional string operation_id = 2;

[GitHub] [spark] allisonwang-db commented on a diff in pull request #42272: [SPARK-44508][PYTHON][DOCS] Add user guide for Python user-defined table functions

2023-08-30 Thread via GitHub
allisonwang-db commented on code in PR #42272: URL: https://github.com/apache/spark/pull/42272#discussion_r1310937419 ## python/docs/source/user_guide/sql/python_udtf.rst: ## @@ -0,0 +1,216 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +or more

[GitHub] [spark] heyihong commented on a diff in pull request #42377: [SPARK-44622][SQL][CONNECT] Implement error enrichment and setting server-side stacktrace

2023-08-30 Thread via GitHub
heyihong commented on code in PR #42377: URL: https://github.com/apache/spark/pull/42377#discussion_r1310937318 ## connector/connect/common/src/main/protobuf/spark/connect/base.proto: ## @@ -778,6 +778,62 @@ message ReleaseExecuteResponse { optional string operation_id = 2;

[GitHub] [spark] allisonwang-db commented on a diff in pull request #42272: [SPARK-44508][PYTHON][DOCS] Add user guide for Python user-defined table functions

2023-08-30 Thread via GitHub
allisonwang-db commented on code in PR #42272: URL: https://github.com/apache/spark/pull/42272#discussion_r1310936299 ## python/docs/source/user_guide/sql/python_udtf.rst: ## @@ -0,0 +1,216 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +or more

[GitHub] [spark] allisonwang-db commented on a diff in pull request #42272: [SPARK-44508][PYTHON][DOCS] Add user guide for Python user-defined table functions

2023-08-30 Thread via GitHub
allisonwang-db commented on code in PR #42272: URL: https://github.com/apache/spark/pull/42272#discussion_r1310936110 ## python/docs/source/user_guide/sql/python_udtf.rst: ## @@ -0,0 +1,216 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +or more

[GitHub] [spark] heyihong commented on a diff in pull request #42377: [SPARK-44622][SQL][CONNECT] Implement error enrichment and setting server-side stacktrace

2023-08-30 Thread via GitHub
heyihong commented on code in PR #42377: URL: https://github.com/apache/spark/pull/42377#discussion_r1310930760 ## connector/connect/common/src/main/protobuf/spark/connect/base.proto: ## @@ -778,6 +778,62 @@ message ReleaseExecuteResponse { optional string operation_id = 2;

[GitHub] [spark] heyihong commented on a diff in pull request #42377: [SPARK-44622][SQL][CONNECT] Implement error enrichment and setting server-side stacktrace

2023-08-30 Thread via GitHub
heyihong commented on code in PR #42377: URL: https://github.com/apache/spark/pull/42377#discussion_r1310930760 ## connector/connect/common/src/main/protobuf/spark/connect/base.proto: ## @@ -778,6 +778,62 @@ message ReleaseExecuteResponse { optional string operation_id = 2;

[GitHub] [spark] heyihong commented on a diff in pull request #42377: [SPARK-44622][SQL][CONNECT] Implement error enrichment and setting server-side stacktrace

2023-08-30 Thread via GitHub
heyihong commented on code in PR #42377: URL: https://github.com/apache/spark/pull/42377#discussion_r1310928148 ## connector/connect/common/src/main/protobuf/spark/connect/base.proto: ## @@ -778,6 +778,62 @@ message ReleaseExecuteResponse { optional string operation_id = 2;

[GitHub] [spark] heyihong commented on a diff in pull request #42377: [SPARK-44622][SQL][CONNECT] Implement error enrichment and setting server-side stacktrace

2023-08-30 Thread via GitHub
heyihong commented on code in PR #42377: URL: https://github.com/apache/spark/pull/42377#discussion_r1310921357 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/utils/ErrorUtils.scala: ## @@ -40,7 +42,15 @@ import

[GitHub] [spark] heyihong commented on a diff in pull request #42377: [SPARK-44622][SQL][CONNECT] Implement error enrichment and setting server-side stacktrace

2023-08-30 Thread via GitHub
heyihong commented on code in PR #42377: URL: https://github.com/apache/spark/pull/42377#discussion_r1310921357 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/utils/ErrorUtils.scala: ## @@ -40,7 +42,15 @@ import

[GitHub] [spark] heyihong commented on a diff in pull request #42377: [SPARK-44622][SQL][CONNECT] Implement error enrichment and setting server-side stacktrace

2023-08-30 Thread via GitHub
heyihong commented on code in PR #42377: URL: https://github.com/apache/spark/pull/42377#discussion_r1310919492 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/streaming/ClientStreamingQuerySuite.scala: ## @@ -175,6 +175,38 @@ class ClientStreamingQuerySuite

[GitHub] [spark] heyihong commented on a diff in pull request #42377: [SPARK-44622][SQL][CONNECT] Implement error enrichment and setting server-side stacktrace

2023-08-30 Thread via GitHub
heyihong commented on code in PR #42377: URL: https://github.com/apache/spark/pull/42377#discussion_r1310876196 ## connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/GrpcExceptionConverter.scala: ## @@ -64,6 +69,94 @@ private[client] object

[GitHub] [spark] sadikovi commented on a diff in pull request #42667: [SPARK-44940][SQL] Improve performance of JSON parsing when "spark.sql.json.enablePartialResults" is enabled

2023-08-30 Thread via GitHub
sadikovi commented on code in PR #42667: URL: https://github.com/apache/spark/pull/42667#discussion_r1310855079 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala: ## @@ -454,7 +454,20 @@ class JacksonParser(

[GitHub] [spark] allisonwang-db commented on a diff in pull request #42595: [SPARK-44901][SQL] Add API in Python UDTF 'analyze' method to return partitioning/ordering expressions

2023-08-30 Thread via GitHub
allisonwang-db commented on code in PR #42595: URL: https://github.com/apache/spark/pull/42595#discussion_r1310650141 ## sql/core/src/test/scala/org/apache/spark/sql/IntegratedUDFTestUtils.scala: ## @@ -441,6 +443,255 @@ object IntegratedUDFTestUtils extends SQLHelper {

[GitHub] [spark] itholic commented on a diff in pull request #40436: [SPARK-42619][PS] Add `show_counts` parameter for DataFrame.info

2023-08-30 Thread via GitHub
itholic commented on code in PR #40436: URL: https://github.com/apache/spark/pull/40436#discussion_r1310668018 ## python/pyspark/pandas/indexes/base.py: ## @@ -289,7 +289,7 @@ def _summary(self, name: Optional[str] = None) -> str: if name is None: name =

[GitHub] [spark] jchen5 commented on a diff in pull request #42725: [SPARK-45009][SQL] Decorrelate predicate subqueries in join condition

2023-08-30 Thread via GitHub
jchen5 commented on code in PR #42725: URL: https://github.com/apache/spark/pull/42725#discussion_r1310660553 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala: ## @@ -3751,4 +3751,14 @@ private[sql] object QueryCompilationErrors extends

[GitHub] [spark] dongjoon-hyun closed pull request #42712: [SPARK-44997][DOCS] Align example order (Python -> Scala/Java -> R) in all Spark Doc Content

2023-08-30 Thread via GitHub
dongjoon-hyun closed pull request #42712: [SPARK-44997][DOCS] Align example order (Python -> Scala/Java -> R) in all Spark Doc Content URL: https://github.com/apache/spark/pull/42712 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] hvanhovell commented on a diff in pull request #42634: [SPARK-44910][SQL] Encoders.bean does not support superclasses with generic type arguments

2023-08-30 Thread via GitHub
hvanhovell commented on code in PR #42634: URL: https://github.com/apache/spark/pull/42634#discussion_r1310650718 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala: ## @@ -156,4 +158,17 @@ object JavaTypeInference { .filterNot(_.getName ==

[GitHub] [spark] dzhigimont commented on a diff in pull request #40436: [SPARK-42619][PS] Add `show_counts` parameter for DataFrame.info

2023-08-30 Thread via GitHub
dzhigimont commented on code in PR #40436: URL: https://github.com/apache/spark/pull/40436#discussion_r1310717375 ## python/pyspark/pandas/indexes/base.py: ## @@ -289,7 +289,7 @@ def _summary(self, name: Optional[str] = None) -> str: if name is None:

[GitHub] [spark] xuanyuanking commented on pull request #42730: [SPARK-44742][PYTHON][DOCS][FOLLOWUP] Upgrade `pydata_sphinx_theme` to 0.8.0 in `spark-rm` Dockerfile

2023-08-30 Thread via GitHub
xuanyuanking commented on PR #42730: URL: https://github.com/apache/spark/pull/42730#issuecomment-1699697157 Thanks for the quick fix! @panbingkun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] allisonwang-db commented on a diff in pull request #42595: [SPARK-44901][SQL] Add API in Python UDTF 'analyze' method to return partitioning/ordering expressions

2023-08-30 Thread via GitHub
allisonwang-db commented on code in PR #42595: URL: https://github.com/apache/spark/pull/42595#discussion_r1310644050 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/PythonUDF.scala: ## @@ -207,6 +208,19 @@ case class UnresolvedPolymorphicPythonUDTF(

[GitHub] [spark] allisonwang-db commented on a diff in pull request #42595: [SPARK-44901][SQL] Add API in Python UDTF 'analyze' method to return partitioning/ordering expressions

2023-08-30 Thread via GitHub
allisonwang-db commented on code in PR #42595: URL: https://github.com/apache/spark/pull/42595#discussion_r1310679860 ## python/pyspark/sql/udtf.py: ## @@ -61,6 +61,26 @@ class AnalyzeArgument: is_table: bool +@dataclass(frozen=True) +class PartitioningColumn: +"""

[GitHub] [spark] dongjoon-hyun commented on pull request #42730: [SPARK-44742][PYTHON][DOCS][FOLLOWUP] Upgrade `pydata_sphinx_theme` to 0.8.0 in `spark-rm` Dockerfile

2023-08-30 Thread via GitHub
dongjoon-hyun commented on PR #42730: URL: https://github.com/apache/spark/pull/42730#issuecomment-1699648201 Merged to master/3.5. Also, cc @xuanyuanking because he is the release manager of Apache Spark 3.5.0 who reports this issue. -- This is an automated message from the

[GitHub] [spark] itholic commented on a diff in pull request #40420: [SPARK-42617][PS] Support `isocalendar` from the pandas 2.0.0

2023-08-30 Thread via GitHub
itholic commented on code in PR #40420: URL: https://github.com/apache/spark/pull/40420#discussion_r1310662293 ## python/pyspark/pandas/datetimes.py: ## @@ -116,26 +117,55 @@ def pandas_microsecond(s) -> ps.Series[np.int32]: # type: ignore[no-untyped-def def

[GitHub] [spark] dongjoon-hyun closed pull request #42736: [SPARK-45017][PYTHON] Add `CalendarIntervalType` to PySpark

2023-08-30 Thread via GitHub
dongjoon-hyun closed pull request #42736: [SPARK-45017][PYTHON] Add `CalendarIntervalType` to PySpark URL: https://github.com/apache/spark/pull/42736 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] itholic commented on a diff in pull request #40420: [SPARK-42617][PS] Support `isocalendar` from the pandas 2.0.0

2023-08-30 Thread via GitHub
itholic commented on code in PR #40420: URL: https://github.com/apache/spark/pull/40420#discussion_r1310660048 ## python/pyspark/pandas/datetimes.py: ## @@ -116,26 +117,55 @@ def pandas_microsecond(s) -> ps.Series[np.int32]: # type: ignore[no-untyped-def def

[GitHub] [spark] itholic commented on pull request #42706: [SPARK-42304][SQL] Rename `_LEGACY_ERROR_TEMP_2189` to `GET_TABLES_BY_TYPE_UNSUPPORTED_BY_HIVE_VERSION`

2023-08-30 Thread via GitHub
itholic commented on PR #42706: URL: https://github.com/apache/spark/pull/42706#issuecomment-1699634991 Welcome, @valentinp17 ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dongjoon-hyun commented on pull request #42712: [SPARK-44997][DOCS] Align example order (Python -> Scala/Java -> R) in all Spark Doc Content

2023-08-30 Thread via GitHub
dongjoon-hyun commented on PR #42712: URL: https://github.com/apache/spark/pull/42712#issuecomment-1699634715 Merged to master for Apache Spark 4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] itholic commented on a diff in pull request #40420: [SPARK-42617][PS] Support `isocalendar` from the pandas 2.0.0

2023-08-30 Thread via GitHub
itholic commented on code in PR #40420: URL: https://github.com/apache/spark/pull/40420#discussion_r1310665191 ## python/pyspark/pandas/tests/indexes/test_datetime.py: ## @@ -269,6 +256,10 @@ def test_map(self): mapper_pser = pd.Series([1, 2, 3], index=pidx)

[GitHub] [spark] dongjoon-hyun closed pull request #42730: [SPARK-44742][PYTHON][DOCS][FOLLOWUP] Upgrade `pydata_sphinx_theme` to 0.8.0 in `spark-rm` Dockerfile

2023-08-30 Thread via GitHub
dongjoon-hyun closed pull request #42730: [SPARK-44742][PYTHON][DOCS][FOLLOWUP] Upgrade `pydata_sphinx_theme` to 0.8.0 in `spark-rm` Dockerfile URL: https://github.com/apache/spark/pull/42730 -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] dongjoon-hyun commented on pull request #42734: [SPARK-45016][PYTHON][CONNECT] Add missing `try_remote_functions` annotations

2023-08-30 Thread via GitHub
dongjoon-hyun commented on PR #42734: URL: https://github.com/apache/spark/pull/42734#issuecomment-1699641192 Merged to master/3.5 because this is filed as a bug subtask of the umbrella JIRA, SPARK-43907. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] dongjoon-hyun closed pull request #42734: [SPARK-45016][PYTHON][CONNECT] Add missing `try_remote_functions` annotations

2023-08-30 Thread via GitHub
dongjoon-hyun closed pull request #42734: [SPARK-45016][PYTHON][CONNECT] Add missing `try_remote_functions` annotations URL: https://github.com/apache/spark/pull/42734 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #42736: [SPARK-45017][PYTHON] Add `CalendarIntervalType` to PySpark

2023-08-30 Thread via GitHub
dongjoon-hyun commented on code in PR #42736: URL: https://github.com/apache/spark/pull/42736#discussion_r1310654984 ## python/pyspark/sql/tests/connect/test_parity_types.py: ## @@ -86,6 +86,10 @@ def test_rdd_with_udt(self): def test_udt(self): super().test_udt()

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #42736: [SPARK-45017][PYTHON] Add `CalendarIntervalType` to PySpark

2023-08-30 Thread via GitHub
dongjoon-hyun commented on code in PR #42736: URL: https://github.com/apache/spark/pull/42736#discussion_r1310654984 ## python/pyspark/sql/tests/connect/test_parity_types.py: ## @@ -86,6 +86,10 @@ def test_rdd_with_udt(self): def test_udt(self): super().test_udt()

[GitHub] [spark] dongjoon-hyun commented on pull request #42738: [SPARK-44990][SQL] Reduce the frequency of get `spark.sql.legacy.nullValueWrittenAsQuotedEmptyStringCsv`

2023-08-30 Thread via GitHub
dongjoon-hyun commented on PR #42738: URL: https://github.com/apache/spark/pull/42738#issuecomment-1699611271 Merged to master/3.5/3.4/3.3. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dongjoon-hyun closed pull request #42719: [SPARK-45005][CONNECT][PS][TESTS] Reducing the CI time by splitting the slow tests

2023-08-30 Thread via GitHub
dongjoon-hyun closed pull request #42719: [SPARK-45005][CONNECT][PS][TESTS] Reducing the CI time by splitting the slow tests URL: https://github.com/apache/spark/pull/42719 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] dongjoon-hyun commented on pull request #42706: [SPARK-42304][SQL] Rename `_LEGACY_ERROR_TEMP_2189` to `GET_TABLES_BY_TYPE_UNSUPPORTED_BY_HIVE_VERSION`

2023-08-30 Thread via GitHub
dongjoon-hyun commented on PR #42706: URL: https://github.com/apache/spark/pull/42706#issuecomment-1699617223 I added to you the Apache Spark contributor group and assigned SPARK-42304 to you. Welcome to the Apache Spark community, @valentinp17 ! -- This is an automated message from

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #42736: [SPARK-45017][PYTHON] Add `CalendarIntervalType` to PySpark

2023-08-30 Thread via GitHub
dongjoon-hyun commented on code in PR #42736: URL: https://github.com/apache/spark/pull/42736#discussion_r1310641064 ## python/pyspark/sql/tests/connect/test_parity_types.py: ## @@ -86,6 +86,10 @@ def test_rdd_with_udt(self): def test_udt(self): super().test_udt()

[GitHub] [spark] dongjoon-hyun closed pull request #42738: [SPARK-44990][SQL] Reduce the frequency of get `spark.sql.legacy.nullValueWrittenAsQuotedEmptyStringCsv`

2023-08-30 Thread via GitHub
dongjoon-hyun closed pull request #42738: [SPARK-44990][SQL] Reduce the frequency of get `spark.sql.legacy.nullValueWrittenAsQuotedEmptyStringCsv` URL: https://github.com/apache/spark/pull/42738 -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] itholic commented on pull request #42719: [SPARK-45005][CONNECT][PS][TESTS] Reducing the CI time by splitting the slow tests

2023-08-30 Thread via GitHub
itholic commented on PR #42719: URL: https://github.com/apache/spark/pull/42719#issuecomment-1699572389 Thanks everyone for the review! Seems like the testing time now have some balance as below: **Before** |pyspark-pandas-connect|pyspark-pandas-slow-connect| |--|--| |3h

[GitHub] [spark] valentinp17 commented on pull request #42706: [SPARK-42304][SQL] Rename `_LEGACY_ERROR_TEMP_2189` to `GET_TABLES_BY_TYPE_UNSUPPORTED_BY_HIVE_VERSION`

2023-08-30 Thread via GitHub
valentinp17 commented on PR #42706: URL: https://github.com/apache/spark/pull/42706#issuecomment-1699566201 Thank you, @dongjoon-hyun I created ASF JIRA account. Jira username: valentinp17 -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #42736: [SPARK-45017][PYTHON] Add `CalendarIntervalType` to PySpark

2023-08-30 Thread via GitHub
dongjoon-hyun commented on code in PR #42736: URL: https://github.com/apache/spark/pull/42736#discussion_r1310586287 ## python/pyspark/sql/tests/connect/test_parity_types.py: ## @@ -86,6 +86,10 @@ def test_rdd_with_udt(self): def test_udt(self): super().test_udt()

[GitHub] [spark] dtenedor commented on a diff in pull request #42595: [SPARK-44901][SQL] Add API in Python UDTF 'analyze' method to return partitioning/ordering expressions

2023-08-30 Thread via GitHub
dtenedor commented on code in PR #42595: URL: https://github.com/apache/spark/pull/42595#discussion_r1310587796 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -2096,23 +2096,87 @@ class Analyzer(override val catalogManager:

[GitHub] [spark] dtenedor commented on a diff in pull request #42663: [SPARK-44952][SQL][PYTHON] Support named arguments in aggregate Pandas UDFs

2023-08-30 Thread via GitHub
dtenedor commented on code in PR #42663: URL: https://github.com/apache/spark/pull/42663#discussion_r1310566256 ## python/pyspark/sql/tests/pandas/test_pandas_udf_grouped_agg.py: ## @@ -575,6 +575,128 @@ def mean(x): assert filtered.collect()[0]["mean"] == 42.0 +

[GitHub] [spark] dtenedor commented on a diff in pull request #42663: [SPARK-44952][SQL][PYTHON] Support named arguments in aggregate Pandas UDFs

2023-08-30 Thread via GitHub
dtenedor commented on code in PR #42663: URL: https://github.com/apache/spark/pull/42663#discussion_r1310564760 ## python/pyspark/sql/tests/pandas/test_pandas_udf_grouped_agg.py: ## @@ -575,6 +575,128 @@ def mean(x): assert filtered.collect()[0]["mean"] == 42.0 +

[GitHub] [spark] WweiL commented on pull request #42686: [SPARK-44971][PYTHON] StreamingQueryProgress event fromJson bug fix

2023-08-30 Thread via GitHub
WweiL commented on PR #42686: URL: https://github.com/apache/spark/pull/42686#issuecomment-1699523781 Hi @HyukjinKwon can we merge this : ) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dzhigimont commented on pull request #40420: [SPARK-42617][PS] Support `isocalendar` from the pandas 2.0.0

2023-08-30 Thread via GitHub
dzhigimont commented on PR #40420: URL: https://github.com/apache/spark/pull/40420#issuecomment-1699498241 Updated the branch -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dongjoon-hyun closed pull request #42739: [SPARK-45021][BUILD] Remove `antlr4-maven-plugin` configuration from `sql/catalyst/pom.xml`

2023-08-30 Thread via GitHub
dongjoon-hyun closed pull request #42739: [SPARK-45021][BUILD] Remove `antlr4-maven-plugin` configuration from `sql/catalyst/pom.xml` URL: https://github.com/apache/spark/pull/42739 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] dongjoon-hyun commented on pull request #42673: [SPARK-44959][BUILD] Upgrade sbt to 1.9.4

2023-08-30 Thread via GitHub
dongjoon-hyun commented on PR #42673: URL: https://github.com/apache/spark/pull/42673#issuecomment-1699415059 For #42733 , I have a question. - https://github.com/apache/spark/pull/42733#pullrequestreview-1602993100 -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] hvanhovell closed pull request #42732: [SPARK-43923][CONNECT][FOLLOW-UP] Propagate extra tags to SparkListenerConnectOperationFinished

2023-08-30 Thread via GitHub
hvanhovell closed pull request #42732: [SPARK-43923][CONNECT][FOLLOW-UP] Propagate extra tags to SparkListenerConnectOperationFinished URL: https://github.com/apache/spark/pull/42732 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] peter-toth opened a new pull request, #42740: [SPARK-][SQL] Provide context for dataset API errors

2023-08-30 Thread via GitHub
peter-toth opened a new pull request, #42740: URL: https://github.com/apache/spark/pull/42740 ### What changes were proposed in this pull request? This PR captures the dataset APIs used by the user code and the call site in the user code and provides better error messages. E.g.

[GitHub] [spark] cloud-fan closed pull request #41782: [SPARK-44239][SQL] Free memory allocated by large vectors when vectors are reset

2023-08-30 Thread via GitHub
cloud-fan closed pull request #41782: [SPARK-44239][SQL] Free memory allocated by large vectors when vectors are reset URL: https://github.com/apache/spark/pull/41782 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

  1   2   >