Re: [PR] [SPARK-47045][SQL] Replace `IllegalArgumentException` by `SparkIllegalArgumentException` in `sql/api` [spark]

2024-02-14 Thread via GitHub
MaxGekk closed pull request #45098: [SPARK-47045][SQL] Replace `IllegalArgumentException` by `SparkIllegalArgumentException` in `sql/api` URL: https://github.com/apache/spark/pull/45098 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47045][SQL] Replace `IllegalArgumentException` by `SparkIllegalArgumentException` in `sql/api` [spark]

2024-02-14 Thread via GitHub
MaxGekk commented on PR #45098: URL: https://github.com/apache/spark/pull/45098#issuecomment-1945501972 Merging to master. Thank you, @srielau for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[PR] [SPARK-47054][PYTHON][TESTS] Remove pinned version of torch for Python 3.12 support [spark]

2024-02-14 Thread via GitHub
HyukjinKwon opened a new pull request, #45113: URL: https://github.com/apache/spark/pull/45113 ### What changes were proposed in this pull request? This PR unpins the version for torch in our CI. ### Why are the changes needed? Testing latest version. This also blocks

Re: [PR] [SPARK-47015][Collation] Disable partitioning on collated columns [spark]

2024-02-14 Thread via GitHub
cloud-fan commented on PR #45104: URL: https://github.com/apache/spark/pull/45104#issuecomment-1945425646 how about bucket columns? We generate the bucket id from the string value and assume all the semantically-same string values should generate the same bucket id, which isn't true for

Re: [PR] [SPARK-47009][SQL] Enable create table support for collation [spark]

2024-02-14 Thread via GitHub
cloud-fan commented on PR #45105: URL: https://github.com/apache/spark/pull/45105#issuecomment-1945424385 We should put more high-level information: what's the corresponding parquet type for string with collation? and how do we fix the parquet max/min column stats? -- This is an

Re: [PR] [SPARK-47009][SQL] Enable create table support for collation [spark]

2024-02-14 Thread via GitHub
cloud-fan commented on code in PR #45105: URL: https://github.com/apache/spark/pull/45105#discussion_r1490442977 ## sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala: ## @@ -117,6 +117,7 @@ object DataType { private val FIXED_DECIMAL =

Re: [PR] [SPARK-47053][INFRA][3.5] Bump python libraries (pandas, pyarrow) in Docker image for release script [spark]

2024-02-14 Thread via GitHub
HeartSaVioR closed pull request #45111: [SPARK-47053][INFRA][3.5] Bump python libraries (pandas, pyarrow) in Docker image for release script URL: https://github.com/apache/spark/pull/45111 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] [SPARK-47053][INFRA][3.5] Bump python libraries (pandas, pyarrow) in Docker image for release script [spark]

2024-02-14 Thread via GitHub
HeartSaVioR commented on PR #45111: URL: https://github.com/apache/spark/pull/45111#issuecomment-1945420722 This was merged via [9b4778f](https://github.com/apache/spark/commit/9b4778fc1dc7047635c9ec19c187d4e75d182590) -- This is an automated message from the Apache Git Service. To

Re: [PR] [SPARK-46687][TESTS][PYTHON][FOLLOW-UP] Skip MemoryProfilerParityTests when codecov enabled [spark]

2024-02-14 Thread via GitHub
HyukjinKwon commented on PR #45112: URL: https://github.com/apache/spark/pull/45112#issuecomment-1945413337 Thx! I manually ran linters, and other tests won't be verified here. therefore just merging.. Merged to master. -- This is an automated message from the Apache Git Service.

Re: [PR] [SPARK-46687][TESTS][PYTHON][FOLLOW-UP] Skip MemoryProfilerParityTests when codecov enabled [spark]

2024-02-14 Thread via GitHub
HyukjinKwon closed pull request #45112: [SPARK-46687][TESTS][PYTHON][FOLLOW-UP] Skip MemoryProfilerParityTests when codecov enabled URL: https://github.com/apache/spark/pull/45112 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-46906][INFRA][3.5] Bump python libraries (pandas, pyarrow) in Docker image for release script [spark]

2024-02-14 Thread via GitHub
HeartSaVioR commented on PR #45111: URL: https://github.com/apache/spark/pull/45111#issuecomment-1945412391 Let me merge this to continue release step. 3.5.1 RC will start with 3.5.1 RC2. -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] [SPARK-46687][TESTS][PYTHON] Skip MemoryProfilerParityTests when codecov enabled [spark]

2024-02-14 Thread via GitHub
HyukjinKwon commented on PR #45112: URL: https://github.com/apache/spark/pull/45112#issuecomment-1945409614 cc @xinrong-meng @ueshin -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] [SPARK-46687][TESTS][PYTHON] Skip MemoryProfilerParityTests when codecov enabled [spark]

2024-02-14 Thread via GitHub
HyukjinKwon opened a new pull request, #45112: URL: https://github.com/apache/spark/pull/45112 ### What changes were proposed in this pull request? This is a followup of https://github.com/apache/spark/pull/44775 that skips the tests with codecov on. It fails now

Re: [PR] [SPARK-47009][Collation] Enable create table support for collation [spark]

2024-02-14 Thread via GitHub
cloud-fan commented on code in PR #45105: URL: https://github.com/apache/spark/pull/45105#discussion_r1490415076 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/parser/DataTypeAstBuilder.scala: ## @@ -58,8 +59,8 @@ class DataTypeAstBuilder extends

Re: [PR] [SPARK-47009][Collation] Enable create table support for collation [spark]

2024-02-14 Thread via GitHub
cloud-fan commented on code in PR #45105: URL: https://github.com/apache/spark/pull/45105#discussion_r1490413649 ## sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -1095,6 +1095,10 @@ colPosition : position=FIRST | position=AFTER

Re: [PR] [SPARK-47044] Add executed query for JDBC external datasources to explain output [spark]

2024-02-14 Thread via GitHub
dtenedor commented on code in PR #45102: URL: https://github.com/apache/spark/pull/45102#discussion_r1490403653 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcSQLQueryBuilder.scala: ## @@ -81,17 +81,20 @@ class JdbcSQLQueryBuilder(dialect: JdbcDialect, options:

Re: [PR] [SPARK-46820][PYTHON] Fix error message regression by restoring `new_msg` [spark]

2024-02-14 Thread via GitHub
HyukjinKwon commented on code in PR #44859: URL: https://github.com/apache/spark/pull/44859#discussion_r1490402927 ## python/pyspark/sql/types.py: ## @@ -2197,8 +2197,10 @@ def verify_nullability(obj: Any) -> bool: return True else:

Re: [PR] [SPARK-47051][INFRA] Create a new test pipeline for `yarn` and `connect` [spark]

2024-02-14 Thread via GitHub
dongjoon-hyun commented on PR #45107: URL: https://github.com/apache/spark/pull/45107#issuecomment-1945383372 Merged to master for now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47051][INFRA] Create a new test pipeline for `yarn` and `connect` [spark]

2024-02-14 Thread via GitHub
dongjoon-hyun closed pull request #45107: [SPARK-47051][INFRA] Create a new test pipeline for `yarn` and `connect` URL: https://github.com/apache/spark/pull/45107 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-47051][INFRA] Create a new test pipeline for `yarn` and `connect` [spark]

2024-02-14 Thread via GitHub
dongjoon-hyun commented on code in PR #45107: URL: https://github.com/apache/spark/pull/45107#discussion_r1490390398 ## .github/workflows/build_and_test.yml: ## @@ -147,8 +147,9 @@ jobs: mllib-local, mllib, graphx - >- streaming,

Re: [PR] [SPARK-47051][INFRA] Create a new test pipeline for `yarn` and `connect` [spark]

2024-02-14 Thread via GitHub
dongjoon-hyun commented on code in PR #45107: URL: https://github.com/apache/spark/pull/45107#discussion_r1490389222 ## .github/workflows/build_and_test.yml: ## @@ -147,8 +147,9 @@ jobs: mllib-local, mllib, graphx - >- streaming,

Re: [PR] [SPARK-47051][INFRA] Create a new test pipeline for `yarn` and `connect` [spark]

2024-02-14 Thread via GitHub
HyukjinKwon commented on code in PR #45107: URL: https://github.com/apache/spark/pull/45107#discussion_r1490374178 ## .github/workflows/build_and_test.yml: ## @@ -147,8 +147,9 @@ jobs: mllib-local, mllib, graphx - >- streaming,

[PR] [SPARK-46906][INFRA][3.5] Bump python libraries (pandas, pyarrow) in Docker image for release script [spark]

2024-02-14 Thread via GitHub
HeartSaVioR opened a new pull request, #45111: URL: https://github.com/apache/spark/pull/45111 ### What changes were proposed in this pull request? This PR proposes to bump python libraries (pandas to 2.0.3, pyarrow to 4.0.0) in Docker image for release script. ### Why are the

Re: [PR] [SPARK-46906][INFRA] Bump python libraries (pandas, pyarrow) in Docker image for release script [spark]

2024-02-14 Thread via GitHub
HeartSaVioR commented on PR #45110: URL: https://github.com/apache/spark/pull/45110#issuecomment-1945364343 docs phase against master branch + this PR failed "before" python docs build: ``` Your bundle is locked to sass-embedded (1.69.7) from rubygems repository

Re: [PR] [SS][SPARK-46928] Add support for ListState in Arbitrary State API v2. [spark]

2024-02-14 Thread via GitHub
HeartSaVioR commented on code in PR #44961: URL: https://github.com/apache/spark/pull/44961#discussion_r1490354597 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateStoreProvider.scala: ## @@ -65,6 +65,43 @@ private[sql] class

Re: [PR] [SPARK-46906][SS] Add a check for stateful operator change for streaming [spark]

2024-02-14 Thread via GitHub
HeartSaVioR commented on PR #44927: URL: https://github.com/apache/spark/pull/44927#issuecomment-1945339250 @jingz-db Mind retriggering GA? You can either manually do this in your fork or simply push an empty commit to do this automatically. Thanks! -- This is an automated message from

Re: [PR] [SPARK-46906][INFRA] Bump python libraries (pandas, pyarrow) in Docker image for release script [spark]

2024-02-14 Thread via GitHub
HeartSaVioR commented on PR #45110: URL: https://github.com/apache/spark/pull/45110#issuecomment-1945338259 I'm now running release script with dry-run against master branch. Will update PR description once it works for master branch as well. Otherwise I'll rebase this PR against

[PR] [SPARK-46906][INFRA] Bump python libraries (pandas, pyarrow) in Docker image for release script [spark]

2024-02-14 Thread via GitHub
HeartSaVioR opened a new pull request, #45110: URL: https://github.com/apache/spark/pull/45110 ### What changes were proposed in this pull request? This PR proposes to bump python libraries (pandas to 2.0.3, pyarrow to 4.0.0) in Docker image for release script. ### Why are the

Re: [PR] [SPARK-47040][CONNECT] Allow Spark Connect Server Script to wait [spark]

2024-02-14 Thread via GitHub
dongjoon-hyun commented on PR #45090: URL: https://github.com/apache/spark/pull/45090#issuecomment-1945326854 If then, we need to revert this. Could you confirm that the above works for you, @grundprinzip ? -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [SPARK-47040][CONNECT] Allow Spark Connect Server Script to wait [spark]

2024-02-14 Thread via GitHub
dongjoon-hyun commented on PR #45090: URL: https://github.com/apache/spark/pull/45090#issuecomment-1945326307 Oh, right. Does it work in the same with this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-47040][CONNECT] Allow Spark Connect Server Script to wait [spark]

2024-02-14 Thread via GitHub
pan3793 commented on PR #45090: URL: https://github.com/apache/spark/pull/45090#issuecomment-1945309543 > ... that leaves it running in the foreground ... could it be achieved by the following command? ``` SPARK_NO_DAEMONIZE=1 ./sbin/start-connect-server.sh ``` -- This is

Re: [PR] [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source [spark]

2024-02-14 Thread via GitHub
chaoqin-li1123 commented on code in PR #45023: URL: https://github.com/apache/spark/pull/45023#discussion_r1490256089 ## python/pyspark/sql/streaming/python_streaming_source_runner.py: ## @@ -0,0 +1,178 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

Re: [PR] [SPARK-47051][INFRA] Create a new test pipeline for `yarn` and `connect` [spark]

2024-02-14 Thread via GitHub
dongjoon-hyun commented on code in PR #45107: URL: https://github.com/apache/spark/pull/45107#discussion_r1490236219 ## .github/workflows/build_and_test.yml: ## @@ -147,8 +147,9 @@ jobs: mllib-local, mllib, graphx - >- streaming,

Re: [PR] [SPARK-47051][INFRA] Create a new test pipeline for `yarn` and `connect` [spark]

2024-02-14 Thread via GitHub
dongjoon-hyun commented on code in PR #45107: URL: https://github.com/apache/spark/pull/45107#discussion_r1490233363 ## .github/workflows/build_and_test.yml: ## @@ -147,8 +147,9 @@ jobs: mllib-local, mllib, graphx - >- streaming,

Re: [PR] [SPARK-47051][INFRA] Create a new test pipeline for `yarn` and `connect` [spark]

2024-02-14 Thread via GitHub
dongjoon-hyun commented on code in PR #45107: URL: https://github.com/apache/spark/pull/45107#discussion_r1490233363 ## .github/workflows/build_and_test.yml: ## @@ -147,8 +147,9 @@ jobs: mllib-local, mllib, graphx - >- streaming,

Re: [PR] [SPARK-45396][PYTHON] Add doc entry for `pyspark.ml.connect` module, and adds `Evaluator` to `__all__` at `ml.connect` [spark]

2024-02-14 Thread via GitHub
HyukjinKwon commented on PR #43210: URL: https://github.com/apache/spark/pull/43210#issuecomment-1945142540 Reverted at https://github.com/apache/spark/commit/ea6b25767fb86732c108c759fd5393caee22f129 in branch-3.5 -- This is an automated message from the Apache Git Service. To respond

Re: [PR] [Don't merge & review] verify sbt on master [spark]

2024-02-14 Thread via GitHub
github-actions[bot] closed pull request #43079: [Don't merge & review] verify sbt on master URL: https://github.com/apache/spark/pull/43079 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-45669][CORE] Ensure the continuity of rolling log index [spark]

2024-02-14 Thread via GitHub
github-actions[bot] commented on PR #43534: URL: https://github.com/apache/spark/pull/43534#issuecomment-1945132826 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-45396][PYTHON] Add doc entry for `pyspark.ml.connect` module, and adds `Evaluator` to `__all__` at `ml.connect` [spark]

2024-02-14 Thread via GitHub
HyukjinKwon commented on PR #43210: URL: https://github.com/apache/spark/pull/43210#issuecomment-1945132337 Let's revert it in branch-3.5, and fix it again. It's not critical bug. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47051][INFRA] Create a new test pipeline for `yarn` and `connect` [spark]

2024-02-14 Thread via GitHub
HyukjinKwon commented on code in PR #45107: URL: https://github.com/apache/spark/pull/45107#discussion_r1490216663 ## .github/workflows/build_and_test.yml: ## @@ -147,8 +147,9 @@ jobs: mllib-local, mllib, graphx - >- streaming,

Re: [PR] [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source [spark]

2024-02-14 Thread via GitHub
chaoqin-li1123 commented on code in PR #45023: URL: https://github.com/apache/spark/pull/45023#discussion_r1490215001 ## sql/core/src/test/scala/org/apache/spark/sql/execution/python/PythonStreamingDataSourceSuite.scala: ## @@ -0,0 +1,168 @@ +/* + * Licensed to the Apache

Re: [PR] [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source [spark]

2024-02-14 Thread via GitHub
chaoqin-li1123 commented on code in PR #45023: URL: https://github.com/apache/spark/pull/45023#discussion_r1490214832 ## sql/core/src/test/scala/org/apache/spark/sql/execution/python/PythonStreamingDataSourceSuite.scala: ## @@ -0,0 +1,168 @@ +/* + * Licensed to the Apache

Re: [PR] [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source [spark]

2024-02-14 Thread via GitHub
chaoqin-li1123 commented on code in PR #45023: URL: https://github.com/apache/spark/pull/45023#discussion_r1490214280 ## sql/core/src/main/scala/org/apache/spark/sql/execution/python/PythonStreamingSourceRunner.scala: ## @@ -0,0 +1,209 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source [spark]

2024-02-14 Thread via GitHub
chaoqin-li1123 commented on code in PR #45023: URL: https://github.com/apache/spark/pull/45023#discussion_r1490213270 ## python/pyspark/sql/streaming/python_streaming_source_runner.py: ## @@ -0,0 +1,178 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

Re: [PR] [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source [spark]

2024-02-14 Thread via GitHub
chaoqin-li1123 commented on code in PR #45023: URL: https://github.com/apache/spark/pull/45023#discussion_r1490211705 ## python/pyspark/sql/streaming/python_streaming_source_runner.py: ## @@ -0,0 +1,178 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[PR] [SPARK-47052][WIP] Separate state tracking variables from MicroBatchExecution/StreamExecution [spark]

2024-02-14 Thread via GitHub
jerrypeng opened a new pull request, #45109: URL: https://github.com/apache/spark/pull/45109 ### What changes were proposed in this pull request? To improve code clarity and maintainability, I propose that we move all the variables that track mutable state and metrics

Re: [PR] [SPARK-47051][INFRA] Create a new test pipeline for `yarn` and `connect` [spark]

2024-02-14 Thread via GitHub
dongjoon-hyun commented on PR #45107: URL: https://github.com/apache/spark/pull/45107#issuecomment-1945067506 Could you review this test pipeline adjustment PR, @HyukjinKwon ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[PR] [wip] value state ttl poc [spark]

2024-02-14 Thread via GitHub
ericm-db opened a new pull request, #45108: URL: https://github.com/apache/spark/pull/45108 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[PR] [SPARK-47051][INFRA] Create a new test pipeline for `yarn` and `connect` [spark]

2024-02-14 Thread via GitHub
dongjoon-hyun opened a new pull request, #45107: URL: https://github.com/apache/spark/pull/45107 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

Re: [PR] [SPARK-47049][BUILD] Ban non-shaded Hadoop dependencies [spark]

2024-02-14 Thread via GitHub
dongjoon-hyun closed pull request #45106: [SPARK-47049][BUILD] Ban non-shaded Hadoop dependencies URL: https://github.com/apache/spark/pull/45106 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47049][BUILD] Ban non-shaded Hadoop dependencies [spark]

2024-02-14 Thread via GitHub
dongjoon-hyun commented on PR #45106: URL: https://github.com/apache/spark/pull/45106#issuecomment-1944840049 Thank you, @sunchao . Yes, it's irrelevant to this. Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source [spark]

2024-02-14 Thread via GitHub
HeartSaVioR commented on code in PR #45023: URL: https://github.com/apache/spark/pull/45023#discussion_r1490134034 ## python/pyspark/sql/datasource.py: ## @@ -298,6 +320,104 @@ def read(self, partition: InputPartition) -> Iterator[Union[Tuple, Row]]: ... +class

Re: [PR] [SPARK-46906][SS] Add a check for stateful operator change for streaming [spark]

2024-02-14 Thread via GitHub
jingz-db commented on code in PR #44927: URL: https://github.com/apache/spark/pull/44927#discussion_r1490116430 ## sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/OperatorStateMetadataSuite.scala: ## @@ -215,4 +216,117 @@ class OperatorStateMetadataSuite

Re: [PR] [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source [spark]

2024-02-14 Thread via GitHub
chaoqin-li1123 commented on code in PR #45023: URL: https://github.com/apache/spark/pull/45023#discussion_r1490111896 ## python/pyspark/sql/datasource.py: ## @@ -298,6 +320,104 @@ def read(self, partition: InputPartition) -> Iterator[Union[Tuple, Row]]: ... +class

Re: [PR] [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source [spark]

2024-02-14 Thread via GitHub
HeartSaVioR commented on code in PR #45023: URL: https://github.com/apache/spark/pull/45023#discussion_r1490109817 ## python/pyspark/sql/datasource.py: ## @@ -298,6 +320,104 @@ def read(self, partition: InputPartition) -> Iterator[Union[Tuple, Row]]: ... +class

Re: [PR] [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source [spark]

2024-02-14 Thread via GitHub
HeartSaVioR commented on code in PR #45023: URL: https://github.com/apache/spark/pull/45023#discussion_r1490109817 ## python/pyspark/sql/datasource.py: ## @@ -298,6 +320,104 @@ def read(self, partition: InputPartition) -> Iterator[Union[Tuple, Row]]: ... +class

Re: [PR] [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source [spark]

2024-02-14 Thread via GitHub
HeartSaVioR commented on code in PR #45023: URL: https://github.com/apache/spark/pull/45023#discussion_r1490107780 ## python/pyspark/sql/datasource.py: ## @@ -298,6 +320,104 @@ def read(self, partition: InputPartition) -> Iterator[Union[Tuple, Row]]: ... +class

Re: [PR] [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source [spark]

2024-02-14 Thread via GitHub
chaoqin-li1123 commented on code in PR #45023: URL: https://github.com/apache/spark/pull/45023#discussion_r1490106637 ## python/pyspark/sql/datasource.py: ## @@ -298,6 +320,104 @@ def read(self, partition: InputPartition) -> Iterator[Union[Tuple, Row]]: ... +class

Re: [PR] [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source [spark]

2024-02-14 Thread via GitHub
HeartSaVioR commented on code in PR #45023: URL: https://github.com/apache/spark/pull/45023#discussion_r1490104863 ## python/pyspark/sql/datasource.py: ## @@ -298,6 +320,104 @@ def read(self, partition: InputPartition) -> Iterator[Union[Tuple, Row]]: ... +class

Re: [PR] [SPARK-46906][SS] Add a check for stateful operator change for streaming [spark]

2024-02-14 Thread via GitHub
HeartSaVioR commented on code in PR #44927: URL: https://github.com/apache/spark/pull/44927#discussion_r1490093460 ## sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/OperatorStateMetadataSuite.scala: ## @@ -215,4 +215,79 @@ class

Re: [PR] [SPARK-47049][BUILD] Ban non-shaded Hadoop dependencies [spark]

2024-02-14 Thread via GitHub
dongjoon-hyun commented on PR #45106: URL: https://github.com/apache/spark/pull/45106#issuecomment-1944653099 Could you review this, please, @sunchao ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-46906][SS] Add a check for stateful operator change for streaming [spark]

2024-02-14 Thread via GitHub
HeartSaVioR commented on code in PR #44927: URL: https://github.com/apache/spark/pull/44927#discussion_r1490087695 ## sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/OperatorStateMetadataSuite.scala: ## @@ -215,4 +216,117 @@ class

Re: [PR] [SS][SPARK-46928] Add support for ListState in Arbitrary State API v2. [spark]

2024-02-14 Thread via GitHub
HeartSaVioR commented on code in PR #44961: URL: https://github.com/apache/spark/pull/44961#discussion_r1490075730 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateStoreProvider.scala: ## @@ -75,6 +76,61 @@ private[sql] class

Re: [PR] [SPARK-46906][SS] Add a check for stateful operator change for streaming [spark]

2024-02-14 Thread via GitHub
jingz-db commented on PR #44927: URL: https://github.com/apache/spark/pull/44927#issuecomment-1944494272 Thanks Jungtaek for your thorough code review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SS][SPARK-46928] Add support for ListState in Arbitrary State API v2. [spark]

2024-02-14 Thread via GitHub
sahnib commented on code in PR #44961: URL: https://github.com/apache/spark/pull/44961#discussion_r1489986029 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ListStateImpl.scala: ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] [SPARK-46906][SS] Add a check for stateful operator change for streaming [spark]

2024-02-14 Thread via GitHub
jingz-db commented on code in PR #44927: URL: https://github.com/apache/spark/pull/44927#discussion_r1489982761 ## sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/OperatorStateMetadataSuite.scala: ## @@ -215,4 +216,117 @@ class OperatorStateMetadataSuite

Re: [PR] [SPARK-46906][SS] Add a check for stateful operator change for streaming [spark]

2024-02-14 Thread via GitHub
jingz-db commented on code in PR #44927: URL: https://github.com/apache/spark/pull/44927#discussion_r1489977792 ## sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/OperatorStateMetadataSuite.scala: ## @@ -215,4 +216,117 @@ class OperatorStateMetadataSuite

Re: [PR] [SPARK-47045][SQL] Replace `IllegalArgumentException` by `SparkIllegalArgumentException` in `sql/api` [spark]

2024-02-14 Thread via GitHub
MaxGekk commented on code in PR #45098: URL: https://github.com/apache/spark/pull/45098#discussion_r1489927146 ## common/utils/src/main/resources/error/error-classes.json: ## @@ -7767,6 +7767,76 @@ "Single backslash is prohibited. It has special meaning as beginning of

Re: [PR] [SPARK-47045][SQL] Replace `IllegalArgumentException` by `SparkIllegalArgumentException` in `sql/api` [spark]

2024-02-14 Thread via GitHub
xinrong-meng commented on code in PR #45098: URL: https://github.com/apache/spark/pull/45098#discussion_r1489921713 ## common/utils/src/main/resources/error/error-classes.json: ## @@ -7767,6 +7767,76 @@ "Single backslash is prohibited. It has special meaning as beginning

[PR] [SPARK-47049][BUILD] Ban non-shaded Hadoop dependencies [spark]

2024-02-14 Thread via GitHub
dongjoon-hyun opened a new pull request, #45106: URL: https://github.com/apache/spark/pull/45106 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

Re: [PR] [SPARK-47014][PYTHON][CONNECT] Implement methods dumpPerfProfiles and dumpMemoryProfiles of SparkSession [spark]

2024-02-14 Thread via GitHub
xinrong-meng commented on PR #45073: URL: https://github.com/apache/spark/pull/45073#issuecomment-1944388920 Merged to master, thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47014][PYTHON][CONNECT] Implement methods dumpPerfProfiles and dumpMemoryProfiles of SparkSession [spark]

2024-02-14 Thread via GitHub
xinrong-meng closed pull request #45073: [SPARK-47014][PYTHON][CONNECT] Implement methods dumpPerfProfiles and dumpMemoryProfiles of SparkSession URL: https://github.com/apache/spark/pull/45073 -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] [SS][SPARK-46928] Add support for ListState in Arbitrary State API v2. [spark]

2024-02-14 Thread via GitHub
anishshri-db commented on code in PR #44961: URL: https://github.com/apache/spark/pull/44961#discussion_r1489907739 ## sql/api/src/main/scala/org/apache/spark/sql/streaming/ValueState.scala: ## @@ -46,5 +46,5 @@ private[sql] trait ValueState[S] extends Serializable { def

Re: [PR] [SS][SPARK-46928] Add support for ListState in Arbitrary State API v2. [spark]

2024-02-14 Thread via GitHub
anishshri-db commented on code in PR #44961: URL: https://github.com/apache/spark/pull/44961#discussion_r1489903465 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ListStateImpl.scala: ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source [spark]

2024-02-14 Thread via GitHub
chaoqin-li1123 commented on code in PR #45023: URL: https://github.com/apache/spark/pull/45023#discussion_r1489903039 ## python/pyspark/sql/datasource.py: ## @@ -298,6 +320,104 @@ def read(self, partition: InputPartition) -> Iterator[Union[Tuple, Row]]: ... +class

Re: [PR] [SPARK-47043][BUILD] add `jackson-core` and `jackson-annotations` dependencies to module `spark-common-utils` [spark]

2024-02-14 Thread via GitHub
dongjoon-hyun commented on PR #45103: URL: https://github.com/apache/spark/pull/45103#issuecomment-1944373538 No problem. I just wanted to inform you. I fixed the field by removing my name. :) -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] [SPARK-47043][BUILD] add `jackson-core` and `jackson-annotations` dependencies to module `spark-common-utils` [spark]

2024-02-14 Thread via GitHub
William1104 commented on PR #45103: URL: https://github.com/apache/spark/pull/45103#issuecomment-1944365291 Hi @dongjoon-hyun I am sorry about that. I created the JIRA via cloning. I don't know how to update the assignee back to myself. Would you mind to change it back to me, or

Re: [PR] [SPARK-43259][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_2024 [spark]

2024-02-14 Thread via GitHub
MaxGekk commented on code in PR #45095: URL: https://github.com/apache/spark/pull/45095#discussion_r1489876902 ## sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala: ## @@ -1151,6 +1153,21 @@ class QueryExecutionErrorsSuite ) ) }

Re: [PR] [SPARK-47015][Collation] Disable partitioning on collated columns [spark]

2024-02-14 Thread via GitHub
MaxGekk commented on code in PR #45104: URL: https://github.com/apache/spark/pull/45104#discussion_r1489873175 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -174,4 +174,36 @@ class CollationSuite extends QueryTest with SharedSparkSession {

Re: [PR] [SPARK-47038][BUILD] Remove shaded `protobuf-java` 2.6.1 dependency from `kinesis-asl-assembly` [spark]

2024-02-14 Thread via GitHub
dongjoon-hyun closed pull request #45096: [SPARK-47038][BUILD] Remove shaded `protobuf-java` 2.6.1 dependency from `kinesis-asl-assembly` URL: https://github.com/apache/spark/pull/45096 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47038][BUILD] Remove shaded `protobuf-java` 2.6.1 dependency from `kinesis-asl-assembly` [spark]

2024-02-14 Thread via GitHub
viirya commented on code in PR #45096: URL: https://github.com/apache/spark/pull/45096#discussion_r1489871042 ## connector/kinesis-asl-assembly/pom.xml: ## @@ -59,16 +59,6 @@ commons-lang provided - - com.google.protobuf - protobuf-java -

Re: [PR] [SPARK-47038][BUILD] Remove shaded `protobuf-java` 2.6.1 dependency from `kinesis-asl-assembly` [spark]

2024-02-14 Thread via GitHub
dongjoon-hyun commented on PR #45096: URL: https://github.com/apache/spark/pull/45096#issuecomment-1944333969 Thank you so much, @viirya ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47043][BUILD] add `jackson-core` and `jackson-annotations` dependencies to module `spark-common-utils` [spark]

2024-02-14 Thread via GitHub
dongjoon-hyun commented on PR #45103: URL: https://github.com/apache/spark/pull/45103#issuecomment-1944331715 BTW, @William1104 , please don't borrow someone-else name like this. ![Screenshot 2024-02-14 at 09 59

Re: [PR] [SPARK-47038][BUILD] Remove shaded `protobuf-java` 2.6.1 dependency from `kinesis-asl-assembly` [spark]

2024-02-14 Thread via GitHub
dongjoon-hyun commented on code in PR #45096: URL: https://github.com/apache/spark/pull/45096#discussion_r1489867281 ## connector/kinesis-asl-assembly/pom.xml: ## @@ -59,16 +59,6 @@ commons-lang provided - - com.google.protobuf - protobuf-java

Re: [PR] [SPARK-47038][BUILD] Remove shaded `protobuf-java` 2.6.1 dependency from `kinesis-asl-assembly` [spark]

2024-02-14 Thread via GitHub
dongjoon-hyun commented on code in PR #45096: URL: https://github.com/apache/spark/pull/45096#discussion_r1489867281 ## connector/kinesis-asl-assembly/pom.xml: ## @@ -59,16 +59,6 @@ commons-lang provided - - com.google.protobuf - protobuf-java

Re: [PR] [SS][SPARK-46928] Add support for ListState in Arbitrary State API v2. [spark]

2024-02-14 Thread via GitHub
anishshri-db commented on code in PR #44961: URL: https://github.com/apache/spark/pull/44961#discussion_r1489865483 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/state/StatePartitionReader.scala: ## @@ -78,7 +78,7 @@ class StatePartitionReader(

Re: [PR] [SPARK-47038][BUILD] Remove shaded `protobuf-java` 2.6.1 dependency from `kinesis-asl-assembly` [spark]

2024-02-14 Thread via GitHub
viirya commented on code in PR #45096: URL: https://github.com/apache/spark/pull/45096#discussion_r1489864752 ## connector/kinesis-asl-assembly/pom.xml: ## @@ -59,16 +59,6 @@ commons-lang provided - - com.google.protobuf - protobuf-java -

Re: [PR] [SS][SPARK-46928] Add support for ListState in Arbitrary State API v2. [spark]

2024-02-14 Thread via GitHub
anishshri-db commented on code in PR #44961: URL: https://github.com/apache/spark/pull/44961#discussion_r1489863110 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateStoreProvider.scala: ## @@ -65,6 +65,43 @@ private[sql] class

Re: [PR] [SPARK-47038][BUILD] Remove shaded `protobuf-java` 2.6.1 dependency from `kinesis-asl-assembly` [spark]

2024-02-14 Thread via GitHub
viirya commented on PR #45096: URL: https://github.com/apache/spark/pull/45096#issuecomment-1944317618 Looking into this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-47038][BUILD] Remove shaded `protobuf-java` 2.6.1 dependency from `kinesis-asl-assembly` [spark]

2024-02-14 Thread via GitHub
dongjoon-hyun commented on PR #45096: URL: https://github.com/apache/spark/pull/45096#issuecomment-1944313907 Could you review this, please, @viirya ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-47038][BUILD] Remove shaded `protobuf-java` 2.6.1 dependency from `kinesis-asl-assembly` [spark]

2024-02-14 Thread via GitHub
dongjoon-hyun commented on PR #45096: URL: https://github.com/apache/spark/pull/45096#issuecomment-1944313704 All tests passed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47036][SS] Cleanup RocksDB file tracking for previously uploaded files if files were deleted from local directory [spark]

2024-02-14 Thread via GitHub
sahnib commented on code in PR #45092: URL: https://github.com/apache/spark/pull/45092#discussion_r1489848282 ## sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBSuite.scala: ## @@ -1863,6 +1864,91 @@ class RocksDBSuite extends

Re: [PR] [SPARK-47039][TESTS] Add a checkstyle rule to ban `commons-lang` in Java code [spark]

2024-02-14 Thread via GitHub
dongjoon-hyun closed pull request #45097: [SPARK-47039][TESTS] Add a checkstyle rule to ban `commons-lang` in Java code URL: https://github.com/apache/spark/pull/45097 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-47039][TESTS] Add a checkstyle rule to ban `commons-lang` in Java code [spark]

2024-02-14 Thread via GitHub
dongjoon-hyun commented on PR #45097: URL: https://github.com/apache/spark/pull/45097#issuecomment-1944303256 Thank you so much, @huaxingao ! Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-47042][BUILD] add missing explicit dependency 'commons-lang3' to the module 'spark-common-utils' [spark]

2024-02-14 Thread via GitHub
dongjoon-hyun commented on PR #45101: URL: https://github.com/apache/spark/pull/45101#issuecomment-1944302594 I agree with you at this point. > By explicitly declaring the dependency, we can avoid any unexpected missing dependencies that might occur when upgrading 'commons-text'.

Re: [PR] [SPARK-47039][TESTS] Add a checkstyle rule to ban `commons-lang` in Java code [spark]

2024-02-14 Thread via GitHub
huaxingao commented on PR #45097: URL: https://github.com/apache/spark/pull/45097#issuecomment-1944296112 LGTM. Thanks for the PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] [SPARK-47009][Collation] Enable create table support for collation [spark]

2024-02-14 Thread via GitHub
stefankandic opened a new pull request, #45105: URL: https://github.com/apache/spark/pull/45105 ### What changes were proposed in this pull request? Adding support for create table with collated columns using parquet ### Why are the changes needed? In order to

Re: [PR] [SPARK-47042][BUILD] add missing explicit dependency 'commons-lang3' to the module 'spark-common-utils' [spark]

2024-02-14 Thread via GitHub
William1104 commented on PR #45101: URL: https://github.com/apache/spark/pull/45101#issuecomment-1944292205 Hi @dongjoon-hyun Thank you for taking the time to review my pull request. I appreciate your feedback and would like to address the points you raised. Regarding the

Re: [PR] [SPARK-47039][TESTS] Add a checkstyle rule to ban `commons-lang` in Java code [spark]

2024-02-14 Thread via GitHub
dongjoon-hyun commented on PR #45097: URL: https://github.com/apache/spark/pull/45097#issuecomment-1944262290 Could you review this PR, please, @huaxingao ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-47039][TESTS] Add a checkstyle rule to ban `commons-lang` in Java code [spark]

2024-02-14 Thread via GitHub
dongjoon-hyun commented on PR #45097: URL: https://github.com/apache/spark/pull/45097#issuecomment-1944261211 All tests passed. ![Screenshot 2024-02-14 at 09 12 56](https://github.com/apache/spark/assets/9700541/23f0f2ea-aeb1-4c34-907a-3415fdc002d3) -- This is an automated

  1   2   >