[GitHub] [spark] MaxGekk commented on pull request #42951: [SPARK-45078][SQL] Fix `array_insert` ImplicitCastInputTypes not work

2023-09-17 Thread via GitHub
MaxGekk commented on PR #42951: URL: https://github.com/apache/spark/pull/42951#issuecomment-1722421266 > The collectionOperations.scala have a lots of Seq.empty. If we need remove it all, I can create a PR for it. Let's leave them as is so far. -- This is an automated message

[GitHub] [spark] MaxGekk commented on pull request #42951: [SPARK-45078][SQL] Fix `array_insert` ImplicitCastInputTypes not work

2023-09-17 Thread via GitHub
MaxGekk commented on PR #42951: URL: https://github.com/apache/spark/pull/42951#issuecomment-1722421890 +1, LGTM. Merging to master/3.5/3.4. Thank you, @Hisoka-X. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] MaxGekk closed pull request #42951: [SPARK-45078][SQL] Fix `array_insert` ImplicitCastInputTypes not work

2023-09-17 Thread via GitHub
MaxGekk closed pull request #42951: [SPARK-45078][SQL] Fix `array_insert` ImplicitCastInputTypes not work URL: https://github.com/apache/spark/pull/42951 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] MaxGekk commented on pull request #42951: [SPARK-45078][SQL] Fix `array_insert` ImplicitCastInputTypes not work

2023-09-17 Thread via GitHub
MaxGekk commented on PR #42951: URL: https://github.com/apache/spark/pull/42951#issuecomment-1722422309 @Hisoka-X The changes cause some conflicts in 3.4. Could you open a PR with backport to `branch-3.4`, please. -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] dongjoon-hyun closed pull request #42945: [SPARK-45180][PS] Remove boolean inputs for `inclusive` parameter from `Series.between`

2023-09-17 Thread via GitHub
dongjoon-hyun closed pull request #42945: [SPARK-45180][PS] Remove boolean inputs for `inclusive` parameter from `Series.between` URL: https://github.com/apache/spark/pull/42945 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] dongjoon-hyun commented on pull request #42945: [SPARK-45180][PS] Remove boolean inputs for `inclusive` parameter from `Series.between`

2023-09-17 Thread via GitHub
dongjoon-hyun commented on PR #42945: URL: https://github.com/apache/spark/pull/42945#issuecomment-1722424315 Merged to master. Thank you, @itholic . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] dongjoon-hyun opened a new pull request, #42959: [SPARK-45187][CORE] Fix `WorkerPage` to use the same pattern for `logPage` query parameters

2023-09-17 Thread via GitHub
dongjoon-hyun opened a new pull request, #42959: URL: https://github.com/apache/spark/pull/42959 ### What changes were proposed in this pull request? This PR aims to use the same pattern for `logPage` query parameters of `WorkerPage`. ### Why are the changes needed? Sinc

[GitHub] [spark] dongjoon-hyun commented on pull request #42959: [SPARK-45187][CORE] Fix `WorkerPage` to use the same pattern for `logPage` query parameters

2023-09-17 Thread via GitHub
dongjoon-hyun commented on PR #42959: URL: https://github.com/apache/spark/pull/42959#issuecomment-1722429321 cc @gengliangwang and @viirya -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [spark] dongjoon-hyun closed pull request #42929: [SPARK-45167][CONNECT][PYTHON] Python client must call `release_all`

2023-09-17 Thread via GitHub
dongjoon-hyun closed pull request #42929: [SPARK-45167][CONNECT][PYTHON] Python client must call `release_all` URL: https://github.com/apache/spark/pull/42929 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] dongjoon-hyun commented on pull request #42929: [SPARK-45167][CONNECT][PYTHON] Python client must call `release_all`

2023-09-17 Thread via GitHub
dongjoon-hyun commented on PR #42929: URL: https://github.com/apache/spark/pull/42929#issuecomment-1722429849 Merged to master. If needed, please make a backporting PR to `branch-3.5`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] dongjoon-hyun commented on pull request #42955: [SPARK-43628][SPARK-43629][CONNECT][PS][TESTS] Clear message for JVM dependent tests.

2023-09-17 Thread via GitHub
dongjoon-hyun commented on PR #42955: URL: https://github.com/apache/spark/pull/42955#issuecomment-1722434698 Could you re-trigger the failed pipelines? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] Hisoka-X opened a new pull request, #42960: [SPARK-45078][SQL][3.4] Fix `array_insert` ImplicitCastInputTypes not work

2023-09-17 Thread via GitHub
Hisoka-X opened a new pull request, #42960: URL: https://github.com/apache/spark/pull/42960 ### What changes were proposed in this pull request? This is a backport PR for https://github.com/apache/spark/pull/42951, to fix `array_insert` ImplicitCastInputTypes not work. ### Why are

[GitHub] [spark] Hisoka-X commented on pull request #42960: [SPARK-45078][SQL][3.4] Fix `array_insert` ImplicitCastInputTypes not work

2023-09-17 Thread via GitHub
Hisoka-X commented on PR #42960: URL: https://github.com/apache/spark/pull/42960#issuecomment-1722462121 cc @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

[GitHub] [spark] panbingkun opened a new pull request, #42961: [Don't merge and review] investigate root cause sbt

2023-09-17 Thread via GitHub
panbingkun opened a new pull request, #42961: URL: https://github.com/apache/spark/pull/42961 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] srowen commented on pull request #42507: [SPARK-44823][PYTHON] Update black to 23.7.0 and fix erroneous check

2023-09-17 Thread via GitHub
srowen commented on PR #42507: URL: https://github.com/apache/spark/pull/42507#issuecomment-1722496507 @panbingkun I think you can rebase now if you want to proceed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

[GitHub] [spark] dongjoon-hyun commented on pull request #42959: [SPARK-45187][CORE] Fix `WorkerPage` to use the same pattern for `logPage` urls

2023-09-17 Thread via GitHub
dongjoon-hyun commented on PR #42959: URL: https://github.com/apache/spark/pull/42959#issuecomment-1722526407 Thank you so much, @viirya ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[GitHub] [spark] dongjoon-hyun closed pull request #42959: [SPARK-45187][CORE] Fix `WorkerPage` to use the same pattern for `logPage` urls

2023-09-17 Thread via GitHub
dongjoon-hyun closed pull request #42959: [SPARK-45187][CORE] Fix `WorkerPage` to use the same pattern for `logPage` urls URL: https://github.com/apache/spark/pull/42959 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] dongjoon-hyun commented on pull request #42959: [SPARK-45187][CORE] Fix `WorkerPage` to use the same pattern for `logPage` urls

2023-09-17 Thread via GitHub
dongjoon-hyun commented on PR #42959: URL: https://github.com/apache/spark/pull/42959#issuecomment-1722526811 Merged to master/3.5/3.4/3.3. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [spark] MaxGekk commented on a diff in pull request #42939: SPARK-43254: Assign a name to the error _LEGACY_ERROR_TEMP_2018

2023-09-17 Thread via GitHub
MaxGekk commented on code in PR #42939: URL: https://github.com/apache/spark/pull/42939#discussion_r1328128269 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala: ## @@ -170,7 +170,7 @@ object ExpressionEncoder { * Function that des

[GitHub] [spark] MaxGekk commented on pull request #42939: SPARK-43254: Assign a name to the error _LEGACY_ERROR_TEMP_2018

2023-09-17 Thread via GitHub
MaxGekk commented on PR #42939: URL: https://github.com/apache/spark/pull/42939#issuecomment-1722537839 @dengziming Please, format PR's title as other using tags `[SPARK-X][SQL] Assign ...` -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] MaxGekk commented on pull request #42755: [SPARK-45034][SQL] Support deterministic mode function

2023-09-17 Thread via GitHub
MaxGekk commented on PR #42755: URL: https://github.com/apache/spark/pull/42755#issuecomment-1722538502 +1, LGTM. Merging to master. Thank you, @peter-toth and @cloud-fan @srielau for review. -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] MaxGekk closed pull request #42755: [SPARK-45034][SQL] Support deterministic mode function

2023-09-17 Thread via GitHub
MaxGekk closed pull request #42755: [SPARK-45034][SQL] Support deterministic mode function URL: https://github.com/apache/spark/pull/42755 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

[GitHub] [spark] MaxGekk commented on a diff in pull request #42524: [SPARK-44837][SQL] Improve ALTER TABLE ALTER PARTITION column error message

2023-09-17 Thread via GitHub
MaxGekk commented on code in PR #42524: URL: https://github.com/apache/spark/pull/42524#discussion_r1328129486 ## common/utils/src/main/resources/error/error-classes.json: ## @@ -92,6 +92,13 @@ "The method can not be called on streaming Dataset/DataFrame." ] },

[GitHub] [spark] gdhuper opened a new pull request, #42962: [SPARK-44033][PYTHON] Added support for binary ops for list like objects

2023-09-17 Thread via GitHub
gdhuper opened a new pull request, #42962: URL: https://github.com/apache/spark/pull/42962 ### What changes were proposed in this pull request? ### Why are the changes needed? Fix for [Spark-44033](https://issues.apache.org/jira/browse/SPARK-44033) ###

[GitHub] [spark] shuwang21 commented on a diff in pull request #42357: [SPARK-44306][YARN] Group FileStatus with few RPC calls within Yarn Client

2023-09-17 Thread via GitHub
shuwang21 commented on code in PR #42357: URL: https://github.com/apache/spark/pull/42357#discussion_r1328148388 ## resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/config.scala: ## @@ -462,6 +462,30 @@ package object config extends Logging { .stringConf

[GitHub] [spark] agubichev opened a new pull request, #42963: WIP: refactor Window operator

2023-09-17 Thread via GitHub
agubichev opened a new pull request, #42963: URL: https://github.com/apache/spark/pull/42963 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? No ### How

[GitHub] [spark] gengliangwang opened a new pull request, #42964: [SPARK-45189][SQL] Creating UnresolvedRelation from TableIdentifier should include the catalog field

2023-09-17 Thread via GitHub
gengliangwang opened a new pull request, #42964: URL: https://github.com/apache/spark/pull/42964 ### What changes were proposed in this pull request? Creating UnresolvedRelation from TableIdentifier should include the catalog field ### Why are the changes needed?

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42956: [SPARK-43654][CONNECT][PS][TESTS] Enable `InternalFrameParityTests.test_from_pandas`

2023-09-17 Thread via GitHub
HyukjinKwon commented on code in PR #42956: URL: https://github.com/apache/spark/pull/42956#discussion_r1328167074 ## python/pyspark/pandas/tests/connect/test_parity_internal.py: ## @@ -15,18 +15,86 @@ # limitations under the License. # import unittest +import pandas as pd

[GitHub] [spark] HyukjinKwon opened a new pull request, #42965: [SPARK-45167][CONNECT][PYTHON][FOLLOW-UP] Use lighter threading Rlock, and use the existing eventually util function

2023-09-17 Thread via GitHub
HyukjinKwon opened a new pull request, #42965: URL: https://github.com/apache/spark/pull/42965 ### What changes were proposed in this pull request? This PR is a followup of https://github.com/apache/spark/pull/42929 that: - Use lighter threading `Rlock` instead of multithreading `Rl

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42929: [SPARK-45167][CONNECT][PYTHON] Python client must call `release_all`

2023-09-17 Thread via GitHub
HyukjinKwon commented on code in PR #42929: URL: https://github.com/apache/spark/pull/42929#discussion_r1328171606 ## python/pyspark/sql/tests/connect/client/test_client.py: ## @@ -147,15 +150,33 @@ def _stub_with(self, execute=None, attach=None): attach_ops=Respons

[GitHub] [spark] zhengruifeng commented on pull request #42958: [SPARK-45168][PYTHON][FOLLOWUP] Add migration guide for Pandas minimum version upgrade

2023-09-17 Thread via GitHub
zhengruifeng commented on PR #42958: URL: https://github.com/apache/spark/pull/42958#issuecomment-1722609739 late LGTM, thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[GitHub] [spark] HyukjinKwon commented on pull request #42916: [MiNOR][DOCS] Fix a typo in HashAggregateExec.scala

2023-09-17 Thread via GitHub
HyukjinKwon commented on PR #42916: URL: https://github.com/apache/spark/pull/42916#issuecomment-1722611273 @neshkeev once you set up your github acitons in your fork, please rebase this so the test is retriggered. -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] github-actions[bot] commented on pull request #41498: [SPARK-44001][Protobuf] spark protobuf: handle well known wrapper types

2023-09-17 Thread via GitHub
github-actions[bot] commented on PR #41498: URL: https://github.com/apache/spark/pull/41498#issuecomment-1722611922 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] closed pull request #41417: [SPARK-43908][SQL] Choose the bigger rowCount to initialize BloomFilterAggregate in InjectRuntimeFilter

2023-09-17 Thread via GitHub
github-actions[bot] closed pull request #41417: [SPARK-43908][SQL] Choose the bigger rowCount to initialize BloomFilterAggregate in InjectRuntimeFilter URL: https://github.com/apache/spark/pull/41417 -- This is an automated message from the Apache Git Service. To respond to the message, pleas

[GitHub] [spark] github-actions[bot] closed pull request #40990: [SPARK-43317][SQL] Support combine adjacent aggregation

2023-09-17 Thread via GitHub
github-actions[bot] closed pull request #40990: [SPARK-43317][SQL] Support combine adjacent aggregation URL: https://github.com/apache/spark/pull/40990 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[GitHub] [spark] github-actions[bot] closed pull request #41108: [SPARK-43427][Protobuf] spark protobuf: modify serde behavior of unsigned integer types

2023-09-17 Thread via GitHub
github-actions[bot] closed pull request #41108: [SPARK-43427][Protobuf] spark protobuf: modify serde behavior of unsigned integer types URL: https://github.com/apache/spark/pull/41108 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[GitHub] [spark] github-actions[bot] closed pull request #39691: [SPARK-31561][SQL] Add QUALIFY clause

2023-09-17 Thread via GitHub
github-actions[bot] closed pull request #39691: [SPARK-31561][SQL] Add QUALIFY clause URL: https://github.com/apache/spark/pull/39691 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42939: SPARK-43254: Assign a name to the error _LEGACY_ERROR_TEMP_2018

2023-09-17 Thread via GitHub
HyukjinKwon commented on code in PR #42939: URL: https://github.com/apache/spark/pull/42939#discussion_r1328174627 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala: ## @@ -170,7 +170,7 @@ object ExpressionEncoder { * Function that

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

2023-09-17 Thread via GitHub
HyukjinKwon commented on code in PR #42949: URL: https://github.com/apache/spark/pull/42949#discussion_r1328174821 ## python/pyspark/sql/connect/client/logging.py: ## @@ -0,0 +1,43 @@ +import logging +import os +from typing import Optional + +__all__ = [ +"logger", Review C

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

2023-09-17 Thread via GitHub
HyukjinKwon commented on code in PR #42949: URL: https://github.com/apache/spark/pull/42949#discussion_r1328174896 ## python/pyspark/sql/connect/client/artifact.py: ## @@ -271,6 +276,7 @@ def add_artifacts(self, *path: str, pyfile: bool, archive: bool, file: bool) -> r

[GitHub] [spark] HyukjinKwon commented on pull request #42962: [SPARK-44033][PYTHON] Added support for binary ops for list like objects

2023-09-17 Thread via GitHub
HyukjinKwon commented on PR #42962: URL: https://github.com/apache/spark/pull/42962#issuecomment-1722616184 cc @itholic -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] itholic commented on pull request #42962: [SPARK-44033][PYTHON] Added support for binary ops for list like objects

2023-09-17 Thread via GitHub
itholic commented on PR #42962: URL: https://github.com/apache/spark/pull/42962#issuecomment-1722632383 Let's fill the PR description and add a unit tests first to verify the if the function is working as expected. We can start with adding a very basic test into `python/pyspark/panda

[GitHub] [spark] itholic commented on pull request #42962: [SPARK-44033][PYTHON] Added support for binary ops for list like objects

2023-09-17 Thread via GitHub
itholic commented on PR #42962: URL: https://github.com/apache/spark/pull/42962#issuecomment-1722632813 Also let's reformat the Python codes. We can easily reformat by running `./dev/reformat-python` on the project root path. -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] zhengruifeng commented on a diff in pull request #42955: [SPARK-43628][SPARK-43629][CONNECT][PS][TESTS] Clear message for JVM dependent tests.

2023-09-17 Thread via GitHub
zhengruifeng commented on code in PR #42955: URL: https://github.com/apache/spark/pull/42955#discussion_r1328181845 ## python/pyspark/pandas/tests/computation/test_compute.py: ## @@ -101,16 +101,10 @@ def test_mode(self): with self.assertRaises(ValueError):

[GitHub] [spark] itholic commented on pull request #42962: [SPARK-44033][PYTHON] Added support for binary ops for list like objects

2023-09-17 Thread via GitHub
itholic commented on PR #42962: URL: https://github.com/apache/spark/pull/42962#issuecomment-1722634015 Basically, could you do: - [ ] complete the PR description - [ ] Add a basic unit tests - [ ] Reformatting codes by running `./dev/reformat-python` -- This is an automated messa

[GitHub] [spark] itholic commented on a diff in pull request #42955: [SPARK-43628][SPARK-43629][CONNECT][PS][TESTS] Clear message for JVM dependent tests.

2023-09-17 Thread via GitHub
itholic commented on code in PR #42955: URL: https://github.com/apache/spark/pull/42955#discussion_r1328183416 ## python/pyspark/pandas/tests/computation/test_compute.py: ## @@ -101,16 +101,10 @@ def test_mode(self): with self.assertRaises(ValueError): psdf

[GitHub] [spark] zhengruifeng opened a new pull request, #42966: [SPARK-45179][DOCS][FOLLOWUP] Add migration guide for Numpy minimum version upgrade

2023-09-17 Thread via GitHub
zhengruifeng opened a new pull request, #42966: URL: https://github.com/apache/spark/pull/42966 ### What changes were proposed in this pull request? Add migration guide for Numpy minimum version upgrade ### Why are the changes needed? to inform users about this important change

[GitHub] [spark] zhengruifeng commented on a diff in pull request #42966: [SPARK-45179][DOCS][FOLLOWUP] Add migration guide for Numpy minimum version upgrade

2023-09-17 Thread via GitHub
zhengruifeng commented on code in PR #42966: URL: https://github.com/apache/spark/pull/42966#discussion_r1328185625 ## python/docs/source/migration_guide/pyspark_upgrade.rst: ## @@ -22,6 +22,8 @@ Upgrading PySpark Upgrading from PySpark 3.5 to 4.0 -

[GitHub] [spark] zhengruifeng commented on a diff in pull request #42966: [SPARK-45179][DOCS][FOLLOWUP] Add migration guide for Numpy minimum version upgrade

2023-09-17 Thread via GitHub
zhengruifeng commented on code in PR #42966: URL: https://github.com/apache/spark/pull/42966#discussion_r1328185671 ## python/docs/source/migration_guide/pyspark_upgrade.rst: ## @@ -22,6 +22,8 @@ Upgrading PySpark Upgrading from PySpark 3.5 to 4.0 -

[GitHub] [spark] zhengruifeng commented on pull request #42966: [SPARK-45179][DOCS][FOLLOWUP] Add migration guide for Numpy minimum version upgrade

2023-09-17 Thread via GitHub
zhengruifeng commented on PR #42966: URL: https://github.com/apache/spark/pull/42966#issuecomment-1722644456 CI link: https://github.com/zhengruifeng/spark/actions/runs/6216985551 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[GitHub] [spark] chenyu-opensource commented on pull request #42919: [SPARK-45160][DOCS]Update the default value of 'spark.executor.logs.rolling.strategy'

2023-09-17 Thread via GitHub
chenyu-opensource commented on PR #42919: URL: https://github.com/apache/spark/pull/42919#issuecomment-1722644892 @srowen I had use a new issure. https://issues.apache.org/jira/browse/SPARK-45160 -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] panbingkun commented on pull request #41824: [SPARK-43570][SPARK-43571][PYTHON][TESTS] Enable DateOpsTests.[test_rsub|test_sub] for pandas 2.0.0.

2023-09-17 Thread via GitHub
panbingkun commented on PR #41824: URL: https://github.com/apache/spark/pull/41824#issuecomment-1722652907 I'm good with fixing it in the current way, let me close it now. > Oh, I just realized that this is already fixed from #42533. > > But seems like the approach is a bit diff

[GitHub] [spark] panbingkun closed pull request #41824: [SPARK-43570][SPARK-43571][PYTHON][TESTS] Enable DateOpsTests.[test_rsub|test_sub] for pandas 2.0.0.

2023-09-17 Thread via GitHub
panbingkun closed pull request #41824: [SPARK-43570][SPARK-43571][PYTHON][TESTS] Enable DateOpsTests.[test_rsub|test_sub] for pandas 2.0.0. URL: https://github.com/apache/spark/pull/41824 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] dcoliversun commented on pull request #42943: [SPARK-45175][K8S] download krb5.conf from remote storage in spark-submit on k8s

2023-09-17 Thread via GitHub
dcoliversun commented on PR #42943: URL: https://github.com/apache/spark/pull/42943#issuecomment-1722653759 @dongjoon-hyun @yaooqinn Thanks for your review. And this is a good question. The specific scenario of this PR is to support users to use krb5.conf on cloud storage, in which authenti

[GitHub] [spark] itholic commented on pull request #42793: [SPARK-45065][PYTHON][PS] Support Pandas 2.1.0

2023-09-17 Thread via GitHub
itholic commented on PR #42793: URL: https://github.com/apache/spark/pull/42793#issuecomment-1722660266 CI link: https://github.com/itholic/spark/actions/runs/6216894150 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42938: [SPARK-44788][CONNECT][PYTHON][SQL] Add from_xml and schema_of_xml to pyspark, spark connect and sql function

2023-09-17 Thread via GitHub
HyukjinKwon commented on code in PR #42938: URL: https://github.com/apache/spark/pull/42938#discussion_r1328194879 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/FunctionTestSuite.scala: ## @@ -229,6 +229,18 @@ class FunctionTestSuite extends ConnectFunSuite

[GitHub] [spark] sandip-db commented on a diff in pull request #42938: [SPARK-44788][CONNECT][PYTHON][SQL] Add from_xml and schema_of_xml to pyspark, spark connect and sql function

2023-09-17 Thread via GitHub
sandip-db commented on code in PR #42938: URL: https://github.com/apache/spark/pull/42938#discussion_r1328196005 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/FunctionTestSuite.scala: ## @@ -229,6 +229,18 @@ class FunctionTestSuite extends ConnectFunSuite {

[GitHub] [spark] zhengruifeng commented on a diff in pull request #42938: [SPARK-44788][CONNECT][PYTHON][SQL] Add from_xml and schema_of_xml to pyspark, spark connect and sql function

2023-09-17 Thread via GitHub
zhengruifeng commented on code in PR #42938: URL: https://github.com/apache/spark/pull/42938#discussion_r1328198653 ## python/pyspark/sql/tests/connect/test_connect_function.py: ## @@ -1821,6 +1821,111 @@ def test_json_functions(self): sdf.select(SF.to_json(SF.struc

[GitHub] [spark] cloud-fan commented on pull request #42952: [SPARK-45184][SQL][DOCS][TESTS] Remove orphaned error class documents

2023-09-17 Thread via GitHub
cloud-fan commented on PR #42952: URL: https://github.com/apache/spark/pull/42952#issuecomment-1722674085 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [spark] cloud-fan closed pull request #42952: [SPARK-45184][SQL][DOCS][TESTS] Remove orphaned error class documents

2023-09-17 Thread via GitHub
cloud-fan closed pull request #42952: [SPARK-45184][SQL][DOCS][TESTS] Remove orphaned error class documents URL: https://github.com/apache/spark/pull/42952 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] cloud-fan commented on a diff in pull request #42951: [SPARK-45078][SQL] Fix `array_insert` ImplicitCastInputTypes not work

2023-09-17 Thread via GitHub
cloud-fan commented on code in PR #42951: URL: https://github.com/apache/spark/pull/42951#discussion_r1328202310 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4749,7 +4749,6 @@ case class ArrayInsert( }

[GitHub] [spark] ulysses-you opened a new pull request, #42967: [SPARK-45191][SQL] InMemoryTableScanExec simpleStringWithNodeId adds columnar info

2023-09-17 Thread via GitHub
ulysses-you opened a new pull request, #42967: URL: https://github.com/apache/spark/pull/42967 ### What changes were proposed in this pull request? InMemoryTableScanExec supports both row-based and columnar input and output which is based on the cache serialzier. It would

[GitHub] [spark] LuciferYang opened a new pull request, #42968: [SPARK-45113][FOLLOWUP] Fix test failed in Scala 2.13

2023-09-17 Thread via GitHub
LuciferYang opened a new pull request, #42968: URL: https://github.com/apache/spark/pull/42968 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] LuciferYang commented on a diff in pull request #42968: [SPARK-45113][FOLLOWUP] Fix test failed in Scala 2.13

2023-09-17 Thread via GitHub
LuciferYang commented on code in PR #42968: URL: https://github.com/apache/spark/pull/42968#discussion_r1328205481 ## .github/workflows/build_and_test.yml: ## @@ -383,6 +383,7 @@ jobs: SKIP_PACKAGING: true METASPACE_SIZE: 1g BRANCH: ${{ inputs.branch }} +

[GitHub] [spark] cloud-fan commented on a diff in pull request #42864: [WIP][SPARK-45112][SQL] Use UnresolvedFunction based resolution in SQL Dataset functions

2023-09-17 Thread via GitHub
cloud-fan commented on code in PR #42864: URL: https://github.com/apache/spark/pull/42864#discussion_r1328209431 ## python/pyspark/sql/column.py: ## @@ -712,11 +712,11 @@ def __getitem__(self, k: Any) -> "Column": >>> df = spark.createDataFrame([('abce

[GitHub] [spark] cloud-fan commented on a diff in pull request #42864: [WIP][SPARK-45112][SQL] Use UnresolvedFunction based resolution in SQL Dataset functions

2023-09-17 Thread via GitHub
cloud-fan commented on code in PR #42864: URL: https://github.com/apache/spark/pull/42864#discussion_r1328209728 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala: ## @@ -442,6 +442,10 @@ case class InSubquery(values: Seq[Expression], quer

[GitHub] [spark] itholic commented on a diff in pull request #42956: [SPARK-43654][CONNECT][PS][TESTS] Enable `InternalFrameParityTests.test_from_pandas`

2023-09-17 Thread via GitHub
itholic commented on code in PR #42956: URL: https://github.com/apache/spark/pull/42956#discussion_r1328210599 ## python/pyspark/pandas/tests/connect/test_parity_internal.py: ## @@ -15,18 +15,86 @@ # limitations under the License. # import unittest +import pandas as pd fro

[GitHub] [spark] yaooqinn opened a new pull request, #42969: [SPARK-45192][UI] Fix overdue lineInterpolate parameter for graphviz edge

2023-09-17 Thread via GitHub
yaooqinn opened a new pull request, #42969: URL: https://github.com/apache/spark/pull/42969 ### What changes were proposed in this pull request? The `edge.lineInterpolate` no longer takes effect for drawing edges. It shall be replaced by d3.curve ### Why ar

[GitHub] [spark] LuciferYang commented on pull request #42908: [SPARK-44872][CONNECT][FOLLOWUP] Deflake ReattachableExecuteSuite and increase retry buffer

2023-09-17 Thread via GitHub
LuciferYang commented on PR #42908: URL: https://github.com/apache/spark/pull/42908#issuecomment-1722694131 > @LuciferYang I tried looking at [#42560 (comment)](https://github.com/apache/spark/pull/42560#issuecomment-1718968002) but did not reproduce it yet. If you have more instances of CI

[GitHub] [spark] zhengruifeng commented on a diff in pull request #42956: [SPARK-43654][CONNECT][PS][TESTS] Enable `InternalFrameParityTests.test_from_pandas`

2023-09-17 Thread via GitHub
zhengruifeng commented on code in PR #42956: URL: https://github.com/apache/spark/pull/42956#discussion_r1328212481 ## python/pyspark/pandas/tests/connect/test_parity_internal.py: ## @@ -15,18 +15,86 @@ # limitations under the License. # import unittest +import pandas as pd

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42956: [SPARK-43654][CONNECT][PS][TESTS] Enable `InternalFrameParityTests.test_from_pandas`

2023-09-17 Thread via GitHub
HyukjinKwon commented on code in PR #42956: URL: https://github.com/apache/spark/pull/42956#discussion_r1328212842 ## python/pyspark/pandas/tests/connect/test_parity_internal.py: ## @@ -15,18 +15,86 @@ # limitations under the License. # import unittest +import pandas as pd

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42956: [SPARK-43654][CONNECT][PS][TESTS] Enable `InternalFrameParityTests.test_from_pandas`

2023-09-17 Thread via GitHub
HyukjinKwon commented on code in PR #42956: URL: https://github.com/apache/spark/pull/42956#discussion_r1328213080 ## python/pyspark/pandas/tests/connect/test_parity_internal.py: ## @@ -15,18 +15,86 @@ # limitations under the License. # import unittest +import pandas as pd

[GitHub] [spark] LuciferYang commented on a diff in pull request #42968: [SPARK-45113][FOLLOWUP] Fix test failed in Scala 2.13

2023-09-17 Thread via GitHub
LuciferYang commented on code in PR #42968: URL: https://github.com/apache/spark/pull/42968#discussion_r1328213649 ## python/pyspark/sql/functions.py: ## @@ -3765,12 +3765,12 @@ def collect_set(col: "ColumnOrName") -> Column: Example 1: Collect values from a single column D

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42938: [SPARK-44788][CONNECT][PYTHON][SQL] Add from_xml and schema_of_xml to pyspark, spark connect and sql function

2023-09-17 Thread via GitHub
HyukjinKwon commented on code in PR #42938: URL: https://github.com/apache/spark/pull/42938#discussion_r1328213652 ## python/pyspark/sql/tests/connect/test_connect_function.py: ## @@ -1821,6 +1821,111 @@ def test_json_functions(self): sdf.select(SF.to_json(SF.struct

[GitHub] [spark] cloud-fan commented on a diff in pull request #42864: [WIP][SPARK-45112][SQL] Use UnresolvedFunction based resolution in SQL Dataset functions

2023-09-17 Thread via GitHub
cloud-fan commented on code in PR #42864: URL: https://github.com/apache/spark/pull/42864#discussion_r1328213801 ## sql/core/src/main/scala/org/apache/spark/sql/Column.scala: ## @@ -234,7 +260,7 @@ class Column(val expr: Expression) extends Logging { * @group expr_ops *

[GitHub] [spark] LuciferYang commented on pull request #42968: [SPARK-45113][FOLLOWUP] Fix test failed in Scala 2.13

2023-09-17 Thread via GitHub
LuciferYang commented on PR #42968: URL: https://github.com/apache/spark/pull/42968#issuecomment-1722697313 will update the PR description after testing Scala 2.13 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[GitHub] [spark] zhengruifeng commented on a diff in pull request #42968: [SPARK-45113][FOLLOWUP] Fix test failed in Scala 2.13

2023-09-17 Thread via GitHub
zhengruifeng commented on code in PR #42968: URL: https://github.com/apache/spark/pull/42968#discussion_r1328217212 ## python/pyspark/sql/functions.py: ## @@ -3765,12 +3765,12 @@ def collect_set(col: "ColumnOrName") -> Column: Example 1: Collect values from a single column

[GitHub] [spark] cloud-fan commented on a diff in pull request #42864: [WIP][SPARK-45112][SQL] Use UnresolvedFunction based resolution in SQL Dataset functions

2023-09-17 Thread via GitHub
cloud-fan commented on code in PR #42864: URL: https://github.com/apache/spark/pull/42864#discussion_r1328217592 ## sql/core/src/main/scala/org/apache/spark/sql/Column.scala: ## @@ -234,7 +260,7 @@ class Column(val expr: Expression) extends Logging { * @group expr_ops *

[GitHub] [spark] itholic commented on a diff in pull request #42956: [SPARK-43654][CONNECT][PS][TESTS] Enable `InternalFrameParityTests.test_from_pandas`

2023-09-17 Thread via GitHub
itholic commented on code in PR #42956: URL: https://github.com/apache/spark/pull/42956#discussion_r1328217955 ## python/pyspark/pandas/tests/connect/test_parity_internal.py: ## @@ -15,18 +15,86 @@ # limitations under the License. # import unittest +import pandas as pd fro

[GitHub] [spark] itholic commented on a diff in pull request #42956: [SPARK-43654][CONNECT][PS][TESTS] Enable `InternalFrameParityTests.test_from_pandas`

2023-09-17 Thread via GitHub
itholic commented on code in PR #42956: URL: https://github.com/apache/spark/pull/42956#discussion_r1328217955 ## python/pyspark/pandas/tests/connect/test_parity_internal.py: ## @@ -15,18 +15,86 @@ # limitations under the License. # import unittest +import pandas as pd fro

[GitHub] [spark] cloud-fan commented on a diff in pull request #42864: [WIP][SPARK-45112][SQL] Use UnresolvedFunction based resolution in SQL Dataset functions

2023-09-17 Thread via GitHub
cloud-fan commented on code in PR #42864: URL: https://github.com/apache/spark/pull/42864#discussion_r1328218485 ## sql/core/src/main/scala/org/apache/spark/sql/functions.scala: ## @@ -414,12 +407,13 @@ object functions { * @group agg_funcs * @since 1.3.0 */ - def c

[GitHub] [spark] itholic commented on a diff in pull request #42956: [SPARK-43654][CONNECT][PS][TESTS] Enable `InternalFrameParityTests.test_from_pandas`

2023-09-17 Thread via GitHub
itholic commented on code in PR #42956: URL: https://github.com/apache/spark/pull/42956#discussion_r1328217955 ## python/pyspark/pandas/tests/connect/test_parity_internal.py: ## @@ -15,18 +15,86 @@ # limitations under the License. # import unittest +import pandas as pd fro

[GitHub] [spark] LuciferYang commented on a diff in pull request #42968: [SPARK-45113][FOLLOWUP] Fix test failed in Scala 2.13

2023-09-17 Thread via GitHub
LuciferYang commented on code in PR #42968: URL: https://github.com/apache/spark/pull/42968#discussion_r1328218686 ## python/pyspark/sql/functions.py: ## @@ -3765,12 +3765,12 @@ def collect_set(col: "ColumnOrName") -> Column: Example 1: Collect values from a single column D

[GitHub] [spark] itholic commented on a diff in pull request #42956: [SPARK-43654][CONNECT][PS][TESTS] Enable `InternalFrameParityTests.test_from_pandas`

2023-09-17 Thread via GitHub
itholic commented on code in PR #42956: URL: https://github.com/apache/spark/pull/42956#discussion_r1328218762 ## python/pyspark/pandas/tests/connect/test_parity_internal.py: ## @@ -15,18 +15,86 @@ # limitations under the License. # import unittest +import pandas as pd fro

[GitHub] [spark] LuciferYang commented on pull request #42908: [SPARK-44872][CONNECT][FOLLOWUP] Deflake ReattachableExecuteSuite and increase retry buffer

2023-09-17 Thread via GitHub
LuciferYang commented on PR #42908: URL: https://github.com/apache/spark/pull/42908#issuecomment-1722702977 ``` dev/change-scala-version.sh 2.13 build/sbt "connect/test" -Pscala-2.13 ``` @juliuszsompolski When I run the above command during local test, it is easier to reproduce `

[GitHub] [spark] zhengruifeng commented on a diff in pull request #42956: [SPARK-43654][CONNECT][PS][TESTS] Enable `InternalFrameParityTests.test_from_pandas`

2023-09-17 Thread via GitHub
zhengruifeng commented on code in PR #42956: URL: https://github.com/apache/spark/pull/42956#discussion_r1328219551 ## python/pyspark/pandas/tests/connect/test_parity_internal.py: ## @@ -15,18 +15,86 @@ # limitations under the License. # import unittest +import pandas as pd

[GitHub] [spark] cloud-fan commented on a diff in pull request #42864: [WIP][SPARK-45112][SQL] Use UnresolvedFunction based resolution in SQL Dataset functions

2023-09-17 Thread via GitHub
cloud-fan commented on code in PR #42864: URL: https://github.com/apache/spark/pull/42864#discussion_r1328220027 ## sql/core/src/main/scala/org/apache/spark/sql/functions.scala: ## @@ -6843,9 +6562,8 @@ object functions { * @since 3.0.0 */ // scalastyle:on line.size.l

[GitHub] [spark] cloud-fan commented on a diff in pull request #42864: [WIP][SPARK-45112][SQL] Use UnresolvedFunction based resolution in SQL Dataset functions

2023-09-17 Thread via GitHub
cloud-fan commented on code in PR #42864: URL: https://github.com/apache/spark/pull/42864#discussion_r1328221713 ## sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala: ## @@ -708,7 +708,7 @@ private[sql] object RelationalGroupedDataset { case expr:

[GitHub] [spark] itholic commented on a diff in pull request #42956: [SPARK-43654][CONNECT][PS][TESTS] Enable `InternalFrameParityTests.test_from_pandas`

2023-09-17 Thread via GitHub
itholic commented on code in PR #42956: URL: https://github.com/apache/spark/pull/42956#discussion_r1328221746 ## python/pyspark/pandas/tests/connect/test_parity_internal.py: ## @@ -15,18 +15,86 @@ # limitations under the License. # import unittest +import pandas as pd fro

[GitHub] [spark] cloud-fan commented on a diff in pull request #42931: [SPARK-45137][CONNECT] Support map/array parameters in parameterized `sql()`

2023-09-17 Thread via GitHub
cloud-fan commented on code in PR #42931: URL: https://github.com/apache/spark/pull/42931#discussion_r1328222151 ## connector/connect/common/src/main/protobuf/spark/connect/commands.proto: ## @@ -65,6 +65,12 @@ message SqlCommand { // (Optional) A sequence of literal expres

[GitHub] [spark] cloud-fan commented on a diff in pull request #42931: [SPARK-45137][CONNECT] Support map/array parameters in parameterized `sql()`

2023-09-17 Thread via GitHub
cloud-fan commented on code in PR #42931: URL: https://github.com/apache/spark/pull/42931#discussion_r132837 ## connector/connect/common/src/main/protobuf/spark/connect/commands.proto: ## @@ -65,6 +65,12 @@ message SqlCommand { // (Optional) A sequence of literal expres

[GitHub] [spark] cloud-fan commented on a diff in pull request #42931: [SPARK-45137][CONNECT] Support map/array parameters in parameterized `sql()`

2023-09-17 Thread via GitHub
cloud-fan commented on code in PR #42931: URL: https://github.com/apache/spark/pull/42931#discussion_r1328222454 ## connector/connect/common/src/main/protobuf/spark/connect/commands.proto: ## @@ -65,6 +65,12 @@ message SqlCommand { // (Optional) A sequence of literal expres

[GitHub] [spark] cloud-fan commented on a diff in pull request #42957: [SPARK-45188][SQL][DOCS] Update error messages related to parameterized `sql()`

2023-09-17 Thread via GitHub
cloud-fan commented on code in PR #42957: URL: https://github.com/apache/spark/pull/42957#discussion_r1328222789 ## common/utils/src/main/resources/error/error-classes.json: ## @@ -1892,7 +1892,7 @@ }, "INVALID_SQL_ARG" : { "message" : [ - "The argument of `sql(

[GitHub] [spark] zhengruifeng commented on pull request #42864: [WIP][SPARK-45112][SQL] Use UnresolvedFunction based resolution in SQL Dataset functions

2023-09-17 Thread via GitHub
zhengruifeng commented on PR #42864: URL: https://github.com/apache/spark/pull/42864#issuecomment-1722709137 also cc @beliefer @panbingkun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [spark] grundprinzip commented on a diff in pull request #42965: [SPARK-45167][CONNECT][PYTHON][FOLLOW-UP] Use lighter threading Rlock, and use the existing eventually util function

2023-09-17 Thread via GitHub
grundprinzip commented on code in PR #42965: URL: https://github.com/apache/spark/pull/42965#discussion_r1328226669 ## python/pyspark/sql/connect/client/reattach.py: ## @@ -18,12 +18,11 @@ check_dependencies(__name__) +from threading import RLock import warnings import uu

[GitHub] [spark] sandip-db commented on a diff in pull request #42938: [SPARK-44788][CONNECT][PYTHON][SQL] Add from_xml and schema_of_xml to pyspark, spark connect and sql function

2023-09-17 Thread via GitHub
sandip-db commented on code in PR #42938: URL: https://github.com/apache/spark/pull/42938#discussion_r1328227219 ## python/pyspark/sql/tests/connect/test_connect_function.py: ## @@ -1821,6 +1821,111 @@ def test_json_functions(self): sdf.select(SF.to_json(SF.struct(S

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42965: [SPARK-45167][CONNECT][PYTHON][FOLLOW-UP] Use lighter threading Rlock, and use the existing eventually util function

2023-09-17 Thread via GitHub
HyukjinKwon commented on code in PR #42965: URL: https://github.com/apache/spark/pull/42965#discussion_r1328227343 ## python/pyspark/sql/connect/client/reattach.py: ## @@ -18,12 +18,11 @@ check_dependencies(__name__) +from threading import RLock import warnings import uui

[GitHub] [spark] gengliangwang commented on pull request #42964: [SPARK-45189][SQL] Creating UnresolvedRelation from TableIdentifier should include the catalog field

2023-09-17 Thread via GitHub
gengliangwang commented on PR #42964: URL: https://github.com/apache/spark/pull/42964#issuecomment-1722719604 Thanks, merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

[GitHub] [spark] gengliangwang closed pull request #42964: [SPARK-45189][SQL] Creating UnresolvedRelation from TableIdentifier should include the catalog field

2023-09-17 Thread via GitHub
gengliangwang closed pull request #42964: [SPARK-45189][SQL] Creating UnresolvedRelation from TableIdentifier should include the catalog field URL: https://github.com/apache/spark/pull/42964 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] zhengruifeng closed pull request #42966: [SPARK-45179][DOCS][FOLLOWUP] Add migration guide for Numpy minimum version upgrade

2023-09-17 Thread via GitHub
zhengruifeng closed pull request #42966: [SPARK-45179][DOCS][FOLLOWUP] Add migration guide for Numpy minimum version upgrade URL: https://github.com/apache/spark/pull/42966 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

  1   2   >