[GitHub] [spark] yaooqinn opened a new pull request, #41935: [WIP] Inject parser and active session to Dataset APIs

2023-07-11 Thread via GitHub
yaooqinn opened a new pull request, #41935: URL: https://github.com/apache/spark/pull/41935 This PR tries to apply one of the following two options. - Inject active session into Dataset APIs that use an existing parser to parse string parameters, such as filter - Inject a parse

[GitHub] [spark] LuciferYang commented on pull request #41934: [SPARK-43974][CONNECT][BUILD][FOLLOWUP] Upgrade buf to v1.23.1

2023-07-11 Thread via GitHub
LuciferYang commented on PR #41934: URL: https://github.com/apache/spark/pull/41934#issuecomment-1630267481 Can you add some descriptions in the `Why are the changes needed?`? It is difficult to search in the commit log with screenshots. -- This is an automated message from the A

[GitHub] [spark] LuciferYang commented on pull request #41469: [SPARK-43974][CONNECT][BUILD] Upgrade buf to v1.23.1

2023-07-11 Thread via GitHub
LuciferYang commented on PR #41469: URL: https://github.com/apache/spark/pull/41469#issuecomment-1630275704 It's strange that the failure started after upgrading to sbt 1.9.2, which doesn't seem to be related ... -- This is an automated message from the Apache Git Service. To resp

[GitHub] [spark] HyukjinKwon commented on pull request #41934: [SPARK-43974][CONNECT][BUILD][FOLLOWUP] Upgrade buf to v1.23.1

2023-07-11 Thread via GitHub
HyukjinKwon commented on PR #41934: URL: https://github.com/apache/spark/pull/41934#issuecomment-1630279519 How does this related to https://github.com/apache/spark/pull/41933? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

[GitHub] [spark] panbingkun commented on pull request #41469: [SPARK-43974][CONNECT][BUILD] Upgrade buf to v1.23.1

2023-07-11 Thread via GitHub
panbingkun commented on PR #41469: URL: https://github.com/apache/spark/pull/41469#issuecomment-1630280284 Yes, when I executed `sh dev/connect-gen-protos.sh` locally, the automatically generated file in Python changed. My local buf version is: https://github.com/apache/spark/assets/1

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41931: [SPARK-43665][CONNECT][PS] Enable PandasSQLStringFormatter.vformat to work with Spark Connect

2023-07-11 Thread via GitHub
HyukjinKwon commented on code in PR #41931: URL: https://github.com/apache/spark/pull/41931#discussion_r1259296353 ## python/pyspark/pandas/sql_formatter.py: ## @@ -265,7 +266,14 @@ def _convert_value(self, val: Any, name: str) -> Optional[str]: val._to_spark().cre

[GitHub] [spark] LuciferYang commented on pull request #41934: [SPARK-43974][CONNECT][BUILD][FOLLOWUP] Upgrade buf to v1.23.1

2023-07-11 Thread via GitHub
LuciferYang commented on PR #41934: URL: https://github.com/apache/spark/pull/41934#issuecomment-1630286152 > How does this related to #41933? I think they fixed the same issue -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] zhengruifeng commented on pull request #41469: [SPARK-43974][CONNECT][BUILD] Upgrade buf to v1.23.1

2023-07-11 Thread via GitHub
zhengruifeng commented on PR #41469: URL: https://github.com/apache/spark/pull/41469#issuecomment-1630287197 @panbingkun it is weird since IIRC the CI in this PR passed. but, if this upgrade cause such large changes in generated codes, let's revert it for now to avoid big changes befo

[GitHub] [spark] LuciferYang commented on a diff in pull request #41933: [SPARK-44370][CONNECT] Migrate Buf remote generation alpha to remote plugins

2023-07-11 Thread via GitHub
LuciferYang commented on code in PR #41933: URL: https://github.com/apache/spark/pull/41933#discussion_r1259300182 ## connector/connect/common/src/main/buf.gen.yaml: ## @@ -16,18 +16,18 @@ # version: v1 plugins: - - remote: buf.build/protocolbuffers/plugins/cpp:v3.20.0-1 +

[GitHub] [spark] zhengruifeng commented on pull request #41933: [SPARK-44370][CONNECT] Migrate Buf remote generation alpha to remote plugins

2023-07-11 Thread via GitHub
zhengruifeng commented on PR #41933: URL: https://github.com/apache/spark/pull/41933#issuecomment-1630290786 @LuciferYang @panbingkun @Hisoka-X @HyukjinKwon do you know what cause this large changes in generated codes? the upgrade to v1.23.1? or this migration? -- This is an autom

[GitHub] [spark] EnricoMi commented on pull request #39952: [SPARK-40770][PYTHON][FOLLOW-UP] Improved error messages for mapInPandas for schema mismatch

2023-07-11 Thread via GitHub
EnricoMi commented on PR #39952: URL: https://github.com/apache/spark/pull/39952#issuecomment-1630296118 Not sure how to fix the `Python code generation check`: https://github.com/G-Research/spark/actions/runs/5516480294/jobs/10057925480#step:18:101 -- This is an automated message from th

[GitHub] [spark] panbingkun commented on a diff in pull request #41933: [SPARK-44370][CONNECT] Migrate Buf remote generation alpha to remote plugins

2023-07-11 Thread via GitHub
panbingkun commented on code in PR #41933: URL: https://github.com/apache/spark/pull/41933#discussion_r1259307304 ## connector/connect/common/src/main/buf.gen.yaml: ## @@ -16,18 +16,18 @@ # version: v1 plugins: - - remote: buf.build/protocolbuffers/plugins/cpp:v3.20.0-1 + -

[GitHub] [spark] LuciferYang commented on pull request #41933: [SPARK-44370][CONNECT] Migrate Buf remote generation alpha to remote plugins

2023-07-11 Thread via GitHub
LuciferYang commented on PR #41933: URL: https://github.com/apache/spark/pull/41933#issuecomment-1630296883 Can we try revert the version of buf 1.23.1 and only updating the `remote-plugins`? If don't need to change so much code, I think we can revert buf version before code freeze -- T

[GitHub] [spark] zhengruifeng commented on pull request #41933: [SPARK-44370][CONNECT] Migrate Buf remote generation alpha to remote plugins

2023-07-11 Thread via GitHub
zhengruifeng commented on PR #41933: URL: https://github.com/apache/spark/pull/41933#issuecomment-1630299746 > Can we try revert the version of buf 1.23.1 and only updating the `remote-plugins`? If don't need to change so much code, I think we can revert buf version before code freeze

[GitHub] [spark] Hisoka-X commented on pull request #41933: [SPARK-44370][CONNECT] Migrate Buf remote generation alpha to remote plugins

2023-07-11 Thread via GitHub
Hisoka-X commented on PR #41933: URL: https://github.com/apache/spark/pull/41933#issuecomment-1630301287 > Can we try revert the version of buf 1.23.1 and only updating the `remote-plugins`? If don't need to change so much code, I think we can revert buf version before code freeze +1

[GitHub] [spark] panbingkun commented on pull request #41934: [SPARK-43974][CONNECT][BUILD][FOLLOWUP] Upgrade buf to v1.23.1

2023-07-11 Thread via GitHub
panbingkun commented on PR #41934: URL: https://github.com/apache/spark/pull/41934#issuecomment-1630302473 > How does this related to #41933? Yes, that's right. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

[GitHub] [spark] zhengruifeng commented on a diff in pull request #41931: [SPARK-43665][CONNECT][PS] Enable PandasSQLStringFormatter.vformat to work with Spark Connect

2023-07-11 Thread via GitHub
zhengruifeng commented on code in PR #41931: URL: https://github.com/apache/spark/pull/41931#discussion_r1259314468 ## python/pyspark/pandas/sql_formatter.py: ## @@ -265,7 +266,14 @@ def _convert_value(self, val: Any, name: str) -> Optional[str]: val._to_spark().cr

[GitHub] [spark] zhengruifeng commented on a diff in pull request #41931: [SPARK-43665][CONNECT][PS] Enable PandasSQLStringFormatter.vformat to work with Spark Connect

2023-07-11 Thread via GitHub
zhengruifeng commented on code in PR #41931: URL: https://github.com/apache/spark/pull/41931#discussion_r1259317780 ## python/pyspark/pandas/sql_formatter.py: ## @@ -265,7 +266,14 @@ def _convert_value(self, val: Any, name: str) -> Optional[str]: val._to_spark().cr

[GitHub] [spark] panbingkun commented on a diff in pull request #41933: [SPARK-44370][CONNECT] Migrate Buf remote generation alpha to remote plugins

2023-07-11 Thread via GitHub
panbingkun commented on code in PR #41933: URL: https://github.com/apache/spark/pull/41933#discussion_r1259319890 ## connector/connect/common/src/main/buf.gen.yaml: ## @@ -16,18 +16,18 @@ # version: v1 plugins: - - remote: buf.build/protocolbuffers/plugins/cpp:v3.20.0-1 + -

[GitHub] [spark] Hisoka-X commented on a diff in pull request #41933: [SPARK-44370][CONNECT] Migrate Buf remote generation alpha to remote plugins

2023-07-11 Thread via GitHub
Hisoka-X commented on code in PR #41933: URL: https://github.com/apache/spark/pull/41933#discussion_r1259320039 ## connector/connect/common/src/main/buf.gen.yaml: ## @@ -16,18 +16,18 @@ # version: v1 plugins: - - remote: buf.build/protocolbuffers/plugins/cpp:v3.20.0-1 + - p

[GitHub] [spark] panbingkun commented on pull request #41933: [SPARK-44370][CONNECT] Migrate Buf remote generation alpha to remote plugins

2023-07-11 Thread via GitHub
panbingkun commented on PR #41933: URL: https://github.com/apache/spark/pull/41933#issuecomment-1630313680 Let's try revert buf. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[GitHub] [spark] beliefer commented on pull request #41884: [SPARK-44325][SQL] Use PartitionEvaluator API in SortMergeJoinExec

2023-07-11 Thread via GitHub
beliefer commented on PR #41884: URL: https://github.com/apache/spark/pull/41884#issuecomment-1630315499 LGTM+1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscri

[GitHub] [spark] panbingkun opened a new pull request, #41936: Revert "[SPARK-43974][CONNECT][BUILD] Upgrade buf to v1.23.1"

2023-07-11 Thread via GitHub
panbingkun opened a new pull request, #41936: URL: https://github.com/apache/spark/pull/41936 This reverts commit b0b12cf3028331c12097f48cc857f4aad0e00d35. ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does th

[GitHub] [spark] zhengruifeng closed pull request #41936: Revert "[SPARK-43974][CONNECT][BUILD] Upgrade buf to v1.23.1"

2023-07-11 Thread via GitHub
zhengruifeng closed pull request #41936: Revert "[SPARK-43974][CONNECT][BUILD] Upgrade buf to v1.23.1" URL: https://github.com/apache/spark/pull/41936 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] Hisoka-X commented on pull request #41933: [SPARK-44370][CONNECT] Migrate Buf remote generation alpha to remote plugins

2023-07-11 Thread via GitHub
Hisoka-X commented on PR #41933: URL: https://github.com/apache/spark/pull/41933#issuecomment-1630318506 > Let's try revert buf. I'm not sure it will work or not, because buf version is 1.17.0 in my local. Also will return error. -- This is an automated message from the Apache Git

[GitHub] [spark] zhengruifeng commented on pull request #41936: Revert "[SPARK-43974][CONNECT][BUILD] Upgrade buf to v1.23.1"

2023-07-11 Thread via GitHub
zhengruifeng commented on PR #41936: URL: https://github.com/apache/spark/pull/41936#issuecomment-1630319757 merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] zhengruifeng commented on pull request #41933: [SPARK-44370][CONNECT] Migrate Buf remote generation alpha to remote plugins

2023-07-11 Thread via GitHub
zhengruifeng commented on PR #41933: URL: https://github.com/apache/spark/pull/41933#issuecomment-1630326822 @Hisoka-X now the buf version is `v1.20.0`, would you mind help checking whether this migration still cause codegen changes? -- This is an automated message from the Apache Git Se

[GitHub] [spark] Hisoka-X commented on pull request #41933: [SPARK-44370][CONNECT] Migrate Buf remote generation alpha to remote plugins

2023-07-11 Thread via GitHub
Hisoka-X commented on PR #41933: URL: https://github.com/apache/spark/pull/41933#issuecomment-1630328500 > @LuciferYang @panbingkun @Hisoka-X @HyukjinKwon > > do you know what cause this large changes in generated codes? the upgrade to v1.23.1? or this migration? Seem like come

[GitHub] [spark] Hisoka-X commented on pull request #41933: [SPARK-44370][CONNECT] Migrate Buf remote generation alpha to remote plugins

2023-07-11 Thread via GitHub
Hisoka-X commented on PR #41933: URL: https://github.com/apache/spark/pull/41933#issuecomment-1630331397 > @Hisoka-X now the buf version is `v1.20.0`, would you mind help checking whether this migration still cause codegen changes? I think the main reason are updated protobuf version,

[GitHub] [spark] gdhuper commented on a diff in pull request #41904: [SPARK-43389][SQL] Added a null check for lineSep option

2023-07-11 Thread via GitHub
gdhuper commented on code in PR #41904: URL: https://github.com/apache/spark/pull/41904#discussion_r1259340808 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala: ## @@ -253,16 +253,18 @@ class CSVOptions( /** * A string between two consecut

[GitHub] [spark] zhengruifeng commented on pull request #41933: [SPARK-44370][CONNECT] Migrate Buf remote generation alpha to remote plugins

2023-07-11 Thread via GitHub
zhengruifeng commented on PR #41933: URL: https://github.com/apache/spark/pull/41933#issuecomment-1630344412 but i think we have pinned the version of protobuf https://github.com/apache/spark/blob/d7bc6f5c7efd9dcb1e460657447d1a9ea8f03f62/.github/workflows/build_and_test.yml#L637 -- This

[GitHub] [spark] panbingkun commented on pull request #41934: [SPARK-43974][CONNECT][BUILD][FOLLOWUP] Upgrade buf to v1.23.1

2023-07-11 Thread via GitHub
panbingkun commented on PR #41934: URL: https://github.com/apache/spark/pull/41934#issuecomment-1630350587 https://github.com/apache/spark/assets/15246973/cbb55024-8cb1-4865-9009-c40846163167";> From the test results, it is not a problem with the Buf version. -- This is an automated mes

[GitHub] [spark] panbingkun commented on pull request #41933: [SPARK-44370][CONNECT] Migrate Buf remote generation alpha to remote plugins

2023-07-11 Thread via GitHub
panbingkun commented on PR #41933: URL: https://github.com/apache/spark/pull/41933#issuecomment-1630352345 From the test results, it is not a problem with the Buf version. https://github.com/apache/spark/assets/15246973/cbb55024-8cb1-4865-9009-c40846163167";> -- This is an automated mes

[GitHub] [spark] Hisoka-X commented on pull request #41933: [SPARK-44370][CONNECT] Migrate Buf remote generation alpha to remote plugins

2023-07-11 Thread via GitHub
Hisoka-X commented on PR #41933: URL: https://github.com/apache/spark/pull/41933#issuecomment-1630359064 > but i think we have pinned the version of protobuf > > https://github.com/apache/spark/blob/d7bc6f5c7efd9dcb1e460657447d1a9ea8f03f62/.github/workflows/build_and_test.yml#L637

[GitHub] [spark] itholic commented on a diff in pull request #41931: [SPARK-43665][CONNECT][PS] Enable PandasSQLStringFormatter.vformat to work with Spark Connect

2023-07-11 Thread via GitHub
itholic commented on code in PR #41931: URL: https://github.com/apache/spark/pull/41931#discussion_r1259367995 ## python/pyspark/pandas/sql_formatter.py: ## @@ -265,7 +266,14 @@ def _convert_value(self, val: Any, name: str) -> Optional[str]: val._to_spark().createO

[GitHub] [spark] MaxGekk commented on pull request #41909: [SPARK-44320][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[1067,1150,1220,1265,1277]

2023-07-11 Thread via GitHub
MaxGekk commented on PR #41909: URL: https://github.com/apache/spark/pull/41909#issuecomment-1630363079 I believe the failure is not related to PR's changes: ```python RUN: /__w/spark/spark/dev/connect-gen-protos.sh /tmp/tmpo7t_pdiv ... subprocess.CalledProcessError: Command '['/

[GitHub] [spark] MaxGekk closed pull request #41909: [SPARK-44320][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[1067,1150,1220,1265,1277]

2023-07-11 Thread via GitHub
MaxGekk closed pull request #41909: [SPARK-44320][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[1067,1150,1220,1265,1277] URL: https://github.com/apache/spark/pull/41909 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] zhengruifeng commented on pull request #41933: [SPARK-44370][CONNECT] Migrate Buf remote generation alpha to remote plugins

2023-07-11 Thread via GitHub
zhengruifeng commented on PR #41933: URL: https://github.com/apache/spark/pull/41933#issuecomment-1630365221 > yeah, this failure actually started before we upgrade buf -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

[GitHub] [spark] itholic commented on a diff in pull request #41931: [SPARK-43665][CONNECT][PS] Enable PandasSQLStringFormatter.vformat to work with Spark Connect

2023-07-11 Thread via GitHub
itholic commented on code in PR #41931: URL: https://github.com/apache/spark/pull/41931#discussion_r1259367995 ## python/pyspark/pandas/sql_formatter.py: ## @@ -265,7 +266,14 @@ def _convert_value(self, val: Any, name: str) -> Optional[str]: val._to_spark().createO

[GitHub] [spark] itholic commented on a diff in pull request #41931: [SPARK-43665][CONNECT][PS] Enable PandasSQLStringFormatter.vformat to work with Spark Connect

2023-07-11 Thread via GitHub
itholic commented on code in PR #41931: URL: https://github.com/apache/spark/pull/41931#discussion_r1259367995 ## python/pyspark/pandas/sql_formatter.py: ## @@ -265,7 +266,14 @@ def _convert_value(self, val: Any, name: str) -> Optional[str]: val._to_spark().createO

[GitHub] [spark] HyukjinKwon commented on pull request #41880: [SPARK-44263][CONNECT] Custom Interceptors Support

2023-07-11 Thread via GitHub
HyukjinKwon commented on PR #41880: URL: https://github.com/apache/spark/pull/41880#issuecomment-1630390623 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HyukjinKwon closed pull request #41880: [SPARK-44263][CONNECT] Custom Interceptors Support

2023-07-11 Thread via GitHub
HyukjinKwon closed pull request #41880: [SPARK-44263][CONNECT] Custom Interceptors Support URL: https://github.com/apache/spark/pull/41880 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41931: [SPARK-43665][CONNECT][PS] Enable PandasSQLStringFormatter.vformat to work with Spark Connect

2023-07-11 Thread via GitHub
HyukjinKwon commented on code in PR #41931: URL: https://github.com/apache/spark/pull/41931#discussion_r1259401203 ## python/pyspark/pandas/sql_formatter.py: ## @@ -265,7 +266,14 @@ def _convert_value(self, val: Any, name: str) -> Optional[str]: val._to_spark().cre

[GitHub] [spark] zhengruifeng commented on a diff in pull request #41931: [SPARK-43665][CONNECT][PS] Enable PandasSQLStringFormatter.vformat to work with Spark Connect

2023-07-11 Thread via GitHub
zhengruifeng commented on code in PR #41931: URL: https://github.com/apache/spark/pull/41931#discussion_r1259403299 ## python/pyspark/pandas/sql_formatter.py: ## @@ -265,7 +266,14 @@ def _convert_value(self, val: Any, name: str) -> Optional[str]: val._to_spark().cr

[GitHub] [spark] LuciferYang commented on pull request #41933: [SPARK-44370][CONNECT] Migrate Buf remote generation alpha to remote plugins

2023-07-11 Thread via GitHub
LuciferYang commented on PR #41933: URL: https://github.com/apache/spark/pull/41933#issuecomment-1630397278 > From the test results, it is not a problem with the Buf version. https://user-images.githubusercontent.com/15246973/252596321-cbb55024-8cb1-4865-9009-c40846163167.png";> S

[GitHub] [spark] zhengruifeng commented on a diff in pull request #41931: [SPARK-43665][CONNECT][PS] Enable PandasSQLStringFormatter.vformat to work with Spark Connect

2023-07-11 Thread via GitHub
zhengruifeng commented on code in PR #41931: URL: https://github.com/apache/spark/pull/41931#discussion_r1259403841 ## python/pyspark/pandas/sql_formatter.py: ## @@ -265,7 +266,14 @@ def _convert_value(self, val: Any, name: str) -> Optional[str]: val._to_spark().cr

[GitHub] [spark] maheshk114 commented on pull request #41860: [SPARK-44307][SQL] Add Bloom filter for left outer join even if the left side table is smaller than broadcast threshold.

2023-07-11 Thread via GitHub
maheshk114 commented on PR #41860: URL: https://github.com/apache/spark/pull/41860#issuecomment-1630402790 > This is so strange. > > ``` > select * > from test_bloom.small_table a > left outer join test_bloom.big_table b > on a.number = b.pk; > ``` > > The SQL sho

[GitHub] [spark] zhengruifeng commented on pull request #41933: [SPARK-44370][CONNECT] Migrate Buf remote generation alpha to remote plugins

2023-07-11 Thread via GitHub
zhengruifeng commented on PR #41933: URL: https://github.com/apache/spark/pull/41933#issuecomment-1630402998 if we can not avoid codegen change, then i am fine to upgrade buf to latest -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] panbingkun commented on pull request #41933: [SPARK-44370][CONNECT] Migrate Buf remote generation alpha to remote plugins

2023-07-11 Thread via GitHub
panbingkun commented on PR #41933: URL: https://github.com/apache/spark/pull/41933#issuecomment-1630410471 > > From the test results, it is not a problem with the Buf version. https://user-images.githubusercontent.com/15246973/252596321-cbb55024-8cb1-4865-9009-c40846163167.png";> > > S

[GitHub] [spark] itholic commented on a diff in pull request #41931: [SPARK-43665][CONNECT][PS] Enable PandasSQLStringFormatter.vformat to work with Spark Connect

2023-07-11 Thread via GitHub
itholic commented on code in PR #41931: URL: https://github.com/apache/spark/pull/41931#discussion_r1259424856 ## python/pyspark/pandas/sql_formatter.py: ## @@ -265,7 +266,14 @@ def _convert_value(self, val: Any, name: str) -> Optional[str]: val._to_spark().createO

[GitHub] [spark] panbingkun commented on pull request #41936: Revert "[SPARK-43974][CONNECT][BUILD] Upgrade buf to v1.23.1"

2023-07-11 Thread via GitHub
panbingkun commented on PR #41936: URL: https://github.com/apache/spark/pull/41936#issuecomment-1630480739 > merged to master @zhengruifeng Will this PR be submitted later? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

[GitHub] [spark] Hisoka-X commented on pull request #41933: [SPARK-44370][CONNECT] Migrate Buf remote generation alpha to remote plugins

2023-07-11 Thread via GitHub
Hisoka-X commented on PR #41933: URL: https://github.com/apache/spark/pull/41933#issuecomment-1630505708 Hi @zhengruifeng @HyukjinKwon @LuciferYang , the CI passed. I think if we don't have another way to fix CI, maybe we can merge this PR, by the way the change only will affect 3.5.0. If d

[GitHub] [spark] MaxGekk commented on pull request #41923: [SPARK-38476][CORE] Use error class in org.apache.spark.storage

2023-07-11 Thread via GitHub
MaxGekk commented on PR #41923: URL: https://github.com/apache/spark/pull/41923#issuecomment-1630541285 The failure is not related to PR's changes, I believe. ```python RUN: /__w/spark/spark/dev/connect-gen-protos.sh /tmp/tmpsdyg7qiw ... subprocess.CalledProcessError: Command '['

[GitHub] [spark] MaxGekk closed pull request #41923: [SPARK-38476][CORE] Use error class in org.apache.spark.storage

2023-07-11 Thread via GitHub
MaxGekk closed pull request #41923: [SPARK-38476][CORE] Use error class in org.apache.spark.storage URL: https://github.com/apache/spark/pull/41923 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[GitHub] [spark] yaooqinn closed pull request #41935: [WIP] Inject parser and active session to Dataset APIs

2023-07-11 Thread via GitHub
yaooqinn closed pull request #41935: [WIP] Inject parser and active session to Dataset APIs URL: https://github.com/apache/spark/pull/41935 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[GitHub] [spark] zhengruifeng commented on pull request #41936: Revert "[SPARK-43974][CONNECT][BUILD] Upgrade buf to v1.23.1"

2023-07-11 Thread via GitHub
zhengruifeng commented on PR #41936: URL: https://github.com/apache/spark/pull/41936#issuecomment-1630569839 @panbingkun it was already merged. feel free to open another PR to upgrade it later -- This is an automated message from the Apache Git Service. To respond to the message, pl

[GitHub] [spark] panbingkun commented on pull request #41936: Revert "[SPARK-43974][CONNECT][BUILD] Upgrade buf to v1.23.1"

2023-07-11 Thread via GitHub
panbingkun commented on PR #41936: URL: https://github.com/apache/spark/pull/41936#issuecomment-1630577207 > @panbingkun it was already merged. > > feel free to open another PR to upgrade it later Okay, let me resubmit a new pr about `upgrade the buf version` again. -- This i

[GitHub] [spark] zhengruifeng commented on pull request #41933: [SPARK-44370][CONNECT] Migrate Buf remote generation alpha to remote plugins

2023-07-11 Thread via GitHub
zhengruifeng commented on PR #41933: URL: https://github.com/apache/spark/pull/41933#issuecomment-1630579815 cc @grundprinzip would you mind also taking a look? It seems that we need this PR to enable the python codegen -- This is an automated message from the Apache Git Service. To respo

[GitHub] [spark] panbingkun opened a new pull request, #41937: [SPARK-43974][CONNECT][BUILD] Upgrade buf to v1.23.1

2023-07-11 Thread via GitHub
panbingkun opened a new pull request, #41937: URL: https://github.com/apache/spark/pull/41937 ### What changes were proposed in this pull request? The pr aims to upgrade buf from 1.20.0 to 1.23.1 ### Why are the changes needed? 1.Release Notes: - https://github.com/bufbuild/b

[GitHub] [spark] yaooqinn opened a new pull request, #41938: [SPARK-44373][SQL] Wrap withActive for Dataset API w/ parse logic to make parser related configuration work

2023-07-11 Thread via GitHub
yaooqinn opened a new pull request, #41938: URL: https://github.com/apache/spark/pull/41938 ### What changes were proposed in this pull request? This PR wraps `withActive` for - filter - where - createTempView - createOrReplaceTempView - createGlobalT

[GitHub] [spark] panbingkun commented on pull request #41937: [SPARK-43974][CONNECT][BUILD] Upgrade buf to v1.23.1

2023-07-11 Thread via GitHub
panbingkun commented on PR #41937: URL: https://github.com/apache/spark/pull/41937#issuecomment-1630601560 In order to address the https://github.com/apache/spark/pull/41933 issue, the above PR has been reversed and is now resubmitted. @zhengruifeng -- This is an automated message fro

[GitHub] [spark] panbingkun commented on pull request #41469: [SPARK-43974][CONNECT][BUILD] Upgrade buf to v1.23.1

2023-07-11 Thread via GitHub
panbingkun commented on PR #41469: URL: https://github.com/apache/spark/pull/41469#issuecomment-1630609358 In order to address the https://github.com/apache/spark/pull/41933 issue, the above PR has been reverted and now resubmitted. https://github.com/apache/spark/pull/41937 -- This is

[GitHub] [spark] yaooqinn commented on pull request #41938: [SPARK-44373][SQL] Wrap withActive for Dataset API w/ parse logic to make parser related configuration work

2023-07-11 Thread via GitHub
yaooqinn commented on PR #41938: URL: https://github.com/apache/spark/pull/41938#issuecomment-1630611229 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

[GitHub] [spark] panbingkun commented on pull request #41909: [SPARK-44320][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[1067,1150,1220,1265,1277]

2023-07-11 Thread via GitHub
panbingkun commented on PR #41909: URL: https://github.com/apache/spark/pull/41909#issuecomment-1630616044 > I believe the failure is not related to PR's changes: > > ```python > RUN: /__w/spark/spark/dev/connect-gen-protos.sh /tmp/tmpo7t_pdiv > ... > subprocess.CalledProcessE

[GitHub] [spark] grundprinzip commented on pull request #41933: [SPARK-44370][CONNECT] Migrate Buf remote generation alpha to remote plugins

2023-07-11 Thread via GitHub
grundprinzip commented on PR #41933: URL: https://github.com/apache/spark/pull/41933#issuecomment-1630637888 We have to be careful that the remotely generated protobuf code matches our protobuf version locally because otherwise we will run into compatibility issues. Protobuf had some change

[GitHub] [spark] beliefer opened a new pull request, #41939: [SPARK-44341][SQL][PYTHON] Define the computing logic through PartitionEvaluator API and use it in WindowExec and WindowInPandasExec

2023-07-11 Thread via GitHub
beliefer opened a new pull request, #41939: URL: https://github.com/apache/spark/pull/41939 ### What changes were proposed in this pull request? `WindowExec` and `WindowInPandasExec` are updated to use the PartitionEvaluator API to do execution. ### Why are the changes needed?

[GitHub] [spark] grundprinzip commented on pull request #41933: [SPARK-44370][CONNECT] Migrate Buf remote generation alpha to remote plugins

2023-07-11 Thread via GitHub
grundprinzip commented on PR #41933: URL: https://github.com/apache/spark/pull/41933#issuecomment-1630651148 Do we know what the changes in the generated code are? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[GitHub] [spark] grundprinzip commented on pull request #41933: [SPARK-44370][CONNECT] Migrate Buf remote generation alpha to remote plugins

2023-07-11 Thread via GitHub
grundprinzip commented on PR #41933: URL: https://github.com/apache/spark/pull/41933#issuecomment-1630653264 I think using the 21.7 version might not be very risky, but we should make sure that we stay consistent in our protobuf usage and we need to check if this should apply as well to upg

[GitHub] [spark] WeichenXu123 opened a new pull request, #41940: [SPARK-44374][PYTHON][ML] Add example code for distributed ML for spark connect

2023-07-11 Thread via GitHub
WeichenXu123 opened a new pull request, #41940: URL: https://github.com/apache/spark/pull/41940 ### What changes were proposed in this pull request? Add example code for distributed ML for spark connect ### Why are the changes needed? Example code for new APIs.

[GitHub] [spark] Hisoka-X commented on pull request #41933: [SPARK-44370][CONNECT] Migrate Buf remote generation alpha to remote plugins

2023-07-11 Thread via GitHub
Hisoka-X commented on PR #41933: URL: https://github.com/apache/spark/pull/41933#issuecomment-1630707847 > What is the diff if we use the 3.14 version of protobuf? It will bring more conflict. And this is a downgrade, it will bring more compatibility issue than v21.7 ![image](http

[GitHub] [spark] LuciferYang commented on pull request #41933: [SPARK-44370][CONNECT] Migrate Buf remote generation alpha to remote plugins

2023-07-11 Thread via GitHub
LuciferYang commented on PR #41933: URL: https://github.com/apache/spark/pull/41933#issuecomment-1630718040 Yeah, Java use `sbt-protoc` or `protobuf-maven-plugin`, not `buf` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] srowen commented on a diff in pull request #41904: [SPARK-43389][SQL] Added a null check for lineSep option

2023-07-11 Thread via GitHub
srowen commented on code in PR #41904: URL: https://github.com/apache/spark/pull/41904#discussion_r1259650448 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala: ## @@ -253,16 +253,18 @@ class CSVOptions( /** * A string between two consecuti

[GitHub] [spark] LuciferYang opened a new pull request, #41941: Test pb 3.23.4

2023-07-11 Thread via GitHub
LuciferYang opened a new pull request, #41941: URL: https://github.com/apache/spark/pull/41941 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] HyukjinKwon opened a new pull request, #41942: [WIP][SPARK-44348][CORE][CONNECT][TESTS] Reenable test_artifact

2023-07-11 Thread via GitHub
HyukjinKwon opened a new pull request, #41942: URL: https://github.com/apache/spark/pull/41942 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch test

[GitHub] [spark] MaxGekk commented on pull request #40506: [SPARK-42881][SQL] Codegen Support for get_json_object

2023-07-11 Thread via GitHub
MaxGekk commented on PR #40506: URL: https://github.com/apache/spark/pull/40506#issuecomment-1630778507 @panbingkun Please, resolve conflicts. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon commented on pull request #41942: [SPARK-44348][CORE][CONNECT][TESTS] Reenable test_artifact with relevant changes

2023-07-11 Thread via GitHub
HyukjinKwon commented on PR #41942: URL: https://github.com/apache/spark/pull/41942#issuecomment-1630796787 Apologies that this PR happened to touch a lot of codebase. I would appreciate if you guys find some time to take a look, cc @hvanhovell @ueshin @vicennial @zhengruifeng -- This

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41931: [SPARK-43665][CONNECT][PS] Enable PandasSQLStringFormatter.vformat to work with Spark Connect

2023-07-11 Thread via GitHub
HyukjinKwon commented on code in PR #41931: URL: https://github.com/apache/spark/pull/41931#discussion_r1259721098 ## python/pyspark/pandas/sql_formatter.py: ## @@ -265,7 +266,13 @@ def _convert_value(self, val: Any, name: str) -> Optional[str]: val._to_spark().cre

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41931: [SPARK-43665][CONNECT][PS] Enable PandasSQLStringFormatter.vformat to work with Spark Connect

2023-07-11 Thread via GitHub
HyukjinKwon commented on code in PR #41931: URL: https://github.com/apache/spark/pull/41931#discussion_r1259720787 ## python/pyspark/pandas/sql_formatter.py: ## @@ -265,7 +266,13 @@ def _convert_value(self, val: Any, name: str) -> Optional[str]: val._to_spark().cre

[GitHub] [spark] panbingkun commented on pull request #40506: [SPARK-42881][SQL] Codegen Support for get_json_object

2023-07-11 Thread via GitHub
panbingkun commented on PR #40506: URL: https://github.com/apache/spark/pull/40506#issuecomment-1630922534 > @panbingkun Please, resolve conflicts. Done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [spark] panbingkun commented on pull request #40506: [SPARK-42881][SQL] Codegen Support for get_json_object

2023-07-11 Thread via GitHub
panbingkun commented on PR #40506: URL: https://github.com/apache/spark/pull/40506#issuecomment-1630931255 Its principle is similar to the following diagram (Although the diagram says Hive UDF Codgen) https://github.com/apache/spark/assets/15246973/b748afb4-28a5-471c-a89b-a9b8dc597378";>

[GitHub] [spark] panbingkun commented on pull request #41933: [SPARK-44370][CONNECT] Migrate Buf remote generation alpha to remote plugins

2023-07-11 Thread via GitHub
panbingkun commented on PR #41933: URL: https://github.com/apache/spark/pull/41933#issuecomment-1630951628 I guess it might be related to this: https://buf.build/blog/breaking-change-governance/ https://github.com/apache/spark/assets/15246973/37a04230-22a8-4c33-a577-d0c1ca7395e5";>

[GitHub] [spark] pm-nuance commented on pull request #37417: [SPARK-33782][K8S][CORE]Place spark.files, spark.jars and spark.files under the current working directory on the driver in K8S cluster mode

2023-07-11 Thread via GitHub
pm-nuance commented on PR #37417: URL: https://github.com/apache/spark/pull/37417#issuecomment-1630954916 @pralabhkumar @HyukjinKwon @holdenk Facing issue with the new Spark version which is using Files.Copy in SparkSubmit.scala. The latest change in the SparkSubmit.scala is causing t

[GitHub] [spark] asl3 commented on a diff in pull request #41927: [SPARK-44216] [PYTHON] Make assertSchemaEqual API with ignore_nullable optional flag

2023-07-11 Thread via GitHub
asl3 commented on code in PR #41927: URL: https://github.com/apache/spark/pull/41927#discussion_r1259848389 ## python/pyspark/testing/utils.py: ## @@ -221,7 +221,130 @@ def check_error( ) -def assertDataFrameEqual(df: DataFrame, expected: DataFrame, check_row_order:

[GitHub] [spark] MaxGekk commented on a diff in pull request #39937: [SPARK-42309][SQL] Introduce `INCOMPATIBLE_DATA_TO_TABLE` and sub classes.

2023-07-11 Thread via GitHub
MaxGekk commented on code in PR #39937: URL: https://github.com/apache/spark/pull/39937#discussion_r1259851206 ## common/utils/src/main/resources/error/error-classes.json: ## @@ -3970,10 +4023,18 @@ "Cannot resolve column name \"\" among ()." ] }, - "_LEGACY_ERRO

[GitHub] [spark] grundprinzip commented on pull request #41933: [SPARK-44370][CONNECT] Migrate Buf remote generation alpha to remote plugins

2023-07-11 Thread via GitHub
grundprinzip commented on PR #41933: URL: https://github.com/apache/spark/pull/41933#issuecomment-1631004094 TBH I don't know how exactly to fix this problem. The buf remote plugins don't offer the protobuf version we need, so maybe the fix is to switch from remote to local plugins. -- T

[GitHub] [spark] eejbyfeldt opened a new pull request, #41943: [SPARK-44376][BUILD] Fix maven build using scala 2.13 and Java 11 or later

2023-07-11 Thread via GitHub
eejbyfeldt opened a new pull request, #41943: URL: https://github.com/apache/spark/pull/41943 ### What changes were proposed in this pull request? Drop hardcoded `--target:jvm-1.8` value from scalac argument in pom.xml. ### Why are the changes needed? Build using mave

[GitHub] [spark] Hisoka-X commented on pull request #41933: [SPARK-44370][CONNECT] Migrate Buf remote generation alpha to remote plugins

2023-07-11 Thread via GitHub
Hisoka-X commented on PR #41933: URL: https://github.com/apache/spark/pull/41933#issuecomment-1631016514 > TBH I don't know how exactly to fix this problem. The buf remote plugins don't offer the protobuf version we need, so maybe the fix is to switch from remote to local plugins. I'

[GitHub] [spark] mathewjacob1002 commented on pull request #41770: [SPARK-44264][ML][PYTHON] Write a Deepspeed Distributed Learning Class DeepspeedTorchDistributor

2023-07-11 Thread via GitHub
mathewjacob1002 commented on PR #41770: URL: https://github.com/apache/spark/pull/41770#issuecomment-1631052798 This is a rebase on top of the Spark master because for some reason the linter was failing when trying to check the generated files. -- This is an automated message from the Apa

[GitHub] [spark] jdesjean commented on a diff in pull request #41748: [SPARK-44145][SQL] Callback when ready for execution

2023-07-11 Thread via GitHub
jdesjean commented on code in PR #41748: URL: https://github.com/apache/spark/pull/41748#discussion_r1259927245 ## sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala: ## @@ -622,21 +622,43 @@ class SparkSession private( * For example, 1, "Steven", Lo

[GitHub] [spark] LuciferYang opened a new pull request, #41944: [SPARK-44377][BUILD] Exclude Junit5 dependencies from `jersey-test-framework-provider-simple`

2023-07-11 Thread via GitHub
LuciferYang opened a new pull request, #41944: URL: https://github.com/apache/spark/pull/41944 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] jdesjean commented on a diff in pull request #41748: [SPARK-44145][SQL] Callback when ready for execution

2023-07-11 Thread via GitHub
jdesjean commented on code in PR #41748: URL: https://github.com/apache/spark/pull/41748#discussion_r1259927245 ## sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala: ## @@ -622,21 +622,43 @@ class SparkSession private( * For example, 1, "Steven", Lo

[GitHub] [spark] LuciferYang commented on pull request #41944: [SPARK-44377][BUILD] Exclude Junit5 dependencies from `jersey-test-framework-provider-simple`

2023-07-11 Thread via GitHub
LuciferYang commented on PR #41944: URL: https://github.com/apache/spark/pull/41944#issuecomment-1631091109 cc @HyukjinKwon @zhengruifeng @panbingkun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] dongjoon-hyun closed pull request #41930: [SPARK-44360][SQL] Support schema pruning in delta-based MERGE operations

2023-07-11 Thread via GitHub
dongjoon-hyun closed pull request #41930: [SPARK-44360][SQL] Support schema pruning in delta-based MERGE operations URL: https://github.com/apache/spark/pull/41930 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [spark] dongjoon-hyun commented on pull request #41930: [SPARK-44360][SQL] Support schema pruning in delta-based MERGE operations

2023-07-11 Thread via GitHub
dongjoon-hyun commented on PR #41930: URL: https://github.com/apache/spark/pull/41930#issuecomment-1631103519 Merged to master for Apache Spark 3.5.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] grundprinzip commented on pull request #41933: [SPARK-44370][CONNECT] Migrate Buf remote generation alpha to remote plugins

2023-07-11 Thread via GitHub
grundprinzip commented on PR #41933: URL: https://github.com/apache/spark/pull/41933#issuecomment-1631139062 My suggestion would be to merge the proposed version with v21.7 and assuming that the integration test pass should be still compatible. This should unblock any new proto chang

[GitHub] [spark] anchovYu commented on a diff in pull request #41864: [SPARK-44059] Add analyzer support of named arguments for built-in functions

2023-07-11 Thread via GitHub
anchovYu commented on code in PR #41864: URL: https://github.com/apache/spark/pull/41864#discussion_r1259998570 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/CountMinSketchAgg.scala: ## @@ -208,3 +209,20 @@ case class CountMinSketchAgg(

[GitHub] [spark] WweiL commented on a diff in pull request #41791: [SPARK-44285] MSK IAM Support

2023-07-11 Thread via GitHub
WweiL commented on code in PR #41791: URL: https://github.com/apache/spark/pull/41791#discussion_r1260010746 ## connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSourceProviderSuite.scala: ## @@ -66,77 +66,78 @@ class KafkaSourceProviderSuite extends Spa

[GitHub] [spark] WweiL commented on a diff in pull request #41791: [SPARK-44285] MSK IAM Support

2023-07-11 Thread via GitHub
WweiL commented on code in PR #41791: URL: https://github.com/apache/spark/pull/41791#discussion_r1260012004 ## connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSourceProviderSuite.scala: ## @@ -18,30 +18,127 @@ package org.apache.spark.sql.kafka010

[GitHub] [spark] pan3793 commented on pull request #41943: [SPARK-44376][BUILD] Fix maven build using scala 2.13 and Java 11 or later

2023-07-11 Thread via GitHub
pan3793 commented on PR #41943: URL: https://github.com/apache/spark/pull/41943#issuecomment-1631184123 A quick question: previously, the output artifacts are runnable on JDK 8 whatever the building JDK version is. is it true after this change? cc @LuciferYang -- This is an automat

[GitHub] [spark] ramon-garcia commented on a diff in pull request #41717: Withdrawn

2023-07-11 Thread via GitHub
ramon-garcia commented on code in PR #41717: URL: https://github.com/apache/spark/pull/41717#discussion_r1245781331 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala: ## @@ -391,6 +390,22 @@ private[parquet] class ParquetRowC

  1   2   3   >