[GitHub] [spark] beliefer commented on pull request #41932: [SPARK-44131][SQL][FOLLOWUP] Support qualified function name for call_function

2023-07-13 Thread via GitHub
beliefer commented on PR #41932: URL: https://github.com/apache/spark/pull/41932#issuecomment-1635368066 > @beliefer branch cut is soon, shall we also support it in Spark Connect? Otherwise, the behaviors will be different It's better to support too. -- This is an automated message

[GitHub] [spark] LuciferYang commented on pull request #41941: [SPARK-44382][BUILD] Upgrade protobuf-java to 3.23.4

2023-07-13 Thread via GitHub
LuciferYang commented on PR #41941: URL: https://github.com/apache/spark/pull/41941#issuecomment-1635354711 friendly ping @HyukjinKwon @dongjoon-hyun @zhengruifeng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

[GitHub] [spark] yaooqinn commented on pull request #41992: [SPARK-44409][SQL] Handle char/varchar in Dataset.to to keep consistent with others

2023-07-13 Thread via GitHub
yaooqinn commented on PR #41992: URL: https://github.com/apache/spark/pull/41992#issuecomment-1635354088 cc @cloud-fan who added this new API in 3.4 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[GitHub] [spark] caican00 commented on pull request #42000: [SPARK-44419][SQL] Support to extract partial filters of datasource v2 table and push them down

2023-07-13 Thread via GitHub
caican00 commented on PR #42000: URL: https://github.com/apache/spark/pull/42000#issuecomment-1635349047 Could you help me to review this PR? gently ping @huaxingao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] maheshk114 commented on pull request #41860: [SPARK-44307][SQL] Add Bloom filter for left outer join even if the left side table is smaller than broadcast threshold.

2023-07-13 Thread via GitHub
maheshk114 commented on PR #41860: URL: https://github.com/apache/spark/pull/41860#issuecomment-1635347890 @beliefer Doing some experiments to check the impact of size of tables on the performance number. As far as bloom is concern, the worst case seems to be the case when left side (bl

[GitHub] [spark] caican00 opened a new pull request, #42000: [SPARK-44419][SQL] Support to extract partial filters of datasource v2 table and push them down

2023-07-13 Thread via GitHub
caican00 opened a new pull request, #42000: URL: https://github.com/apache/spark/pull/42000 ### What changes were proposed in this pull request? For such queries, ``` where (date = 20221110 and udfStrLen(data) = 8) or (date = 2022 and udfStrLen(data) = 8) ```, we

[GitHub] [spark] HyukjinKwon closed pull request #41947: [SPARK-44217][PYTHON] Allow custom precision for fp approx equality

2023-07-13 Thread via GitHub
HyukjinKwon closed pull request #41947: [SPARK-44217][PYTHON] Allow custom precision for fp approx equality URL: https://github.com/apache/spark/pull/41947 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] HyukjinKwon commented on pull request #41947: [SPARK-44217][PYTHON] Allow custom precision for fp approx equality

2023-07-13 Thread via GitHub
HyukjinKwon commented on PR #41947: URL: https://github.com/apache/spark/pull/41947#issuecomment-1635329507 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] yaooqinn commented on a diff in pull request #41993: [SPARK-44414][SQL] Fixed matching check for CharType/VarcharType

2023-07-13 Thread via GitHub
yaooqinn commented on code in PR #41993: URL: https://github.com/apache/spark/pull/41993#discussion_r1263323716 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/V2WriteAnalysisSuite.scala: ## @@ -744,6 +744,46 @@ abstract class V2WriteAnalysisSuiteBase exten

[GitHub] [spark] yaooqinn commented on a diff in pull request #41993: [SPARK-44414][SQL] Fixed matching check for CharType/VarcharType

2023-07-13 Thread via GitHub
yaooqinn commented on code in PR #41993: URL: https://github.com/apache/spark/pull/41993#discussion_r1263323716 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/V2WriteAnalysisSuite.scala: ## @@ -744,6 +744,46 @@ abstract class V2WriteAnalysisSuiteBase exten

[GitHub] [spark] MaxGekk closed pull request #41985: [SPARK-44391][SQL][3.4] Check the number of argument types in `InvokeLike`

2023-07-13 Thread via GitHub
MaxGekk closed pull request #41985: [SPARK-44391][SQL][3.4] Check the number of argument types in `InvokeLike` URL: https://github.com/apache/spark/pull/41985 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] MaxGekk commented on pull request #41985: [SPARK-44391][SQL][3.4] Check the number of argument types in `InvokeLike`

2023-07-13 Thread via GitHub
MaxGekk commented on PR #41985: URL: https://github.com/apache/spark/pull/41985#issuecomment-1635301576 Merging to 3.4. Thank you, @cloud-fan for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] HyukjinKwon commented on pull request #41997: [SPARK-44222][BUILD][PYTHON] Upgrade grpc to 1.56.0 with lower/upperbound

2023-07-13 Thread via GitHub
HyukjinKwon commented on PR #41997: URL: https://github.com/apache/spark/pull/41997#issuecomment-1635300230 sorry for a bit of rushing. All is back 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] zhengruifeng commented on pull request #41932: [SPARK-44131][SQL][FOLLOWUP] Support qualified function name for call_function

2023-07-13 Thread via GitHub
zhengruifeng commented on PR #41932: URL: https://github.com/apache/spark/pull/41932#issuecomment-1635298609 @beliefer branch cut is soon, shall we also support it in Spark Connect? Otherwise, the behaviors will be different -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] dongjoon-hyun commented on pull request #41997: [SPARK-44222][BUILD][PYTHON] Upgrade grpc to 1.56.0 with lower/upperbound

2023-07-13 Thread via GitHub
dongjoon-hyun commented on PR #41997: URL: https://github.com/apache/spark/pull/41997#issuecomment-1635298123 It looks good to me. Thank you, @HyukjinKwon . https://github.com/apache/spark/assets/9700541/7c4c84ea-15c9-4ad8-ab85-0f80a444fd44";> -- This is an automated message fro

[GitHub] [spark] HyukjinKwon opened a new pull request, #41999: [SPARK-44418][PYTHON][CONNECT] Upgrade protobuf from 3.19.5 to 3.20.3

2023-07-13 Thread via GitHub
HyukjinKwon opened a new pull request, #41999: URL: https://github.com/apache/spark/pull/41999 ### What changes were proposed in this pull request? This PR proposes to upgrade protobuf from 3.19.5 to 3.20.3. ### Why are the changes needed? To use the latest version in pro

[GitHub] [spark] viirya commented on pull request #41939: [SPARK-44341][SQL][PYTHON] Define the computing logic through PartitionEvaluator API and use it in WindowExec and WindowInPandasExec

2023-07-13 Thread via GitHub
viirya commented on PR #41939: URL: https://github.com/apache/spark/pull/41939#issuecomment-1635282816 > > As tests are skipped now, maybe you can enable it to test change in CI, and revert back current config before merging. > > In fact, CI passed in previous commit. Oh okay.

[GitHub] [spark] LuciferYang commented on a diff in pull request #41856: [SPARK-44301][SQL] Add Benchmark Suite for TPCH

2023-07-13 Thread via GitHub
LuciferYang commented on code in PR #41856: URL: https://github.com/apache/spark/pull/41856#discussion_r1263294643 ## .github/workflows/benchmark.yml: ## @@ -186,7 +251,7 @@ jobs: # To keep the directory structure and file permissions, tar them # See also http

[GitHub] [spark] LuciferYang commented on a diff in pull request #41856: [SPARK-44301][SQL] Add Benchmark Suite for TPCH

2023-07-13 Thread via GitHub
LuciferYang commented on code in PR #41856: URL: https://github.com/apache/spark/pull/41856#discussion_r1263294643 ## .github/workflows/benchmark.yml: ## @@ -186,7 +251,7 @@ jobs: # To keep the directory structure and file permissions, tar them # See also http

[GitHub] [spark] LuciferYang commented on a diff in pull request #41856: [SPARK-44301][SQL] Add Benchmark Suite for TPCH

2023-07-13 Thread via GitHub
LuciferYang commented on code in PR #41856: URL: https://github.com/apache/spark/pull/41856#discussion_r1263294643 ## .github/workflows/benchmark.yml: ## @@ -186,7 +251,7 @@ jobs: # To keep the directory structure and file permissions, tar them # See also http

[GitHub] [spark] dependabot[bot] commented on pull request #41996: Bump grpcio from 1.48.1 to 1.53.0 in /dev

2023-07-13 Thread via GitHub
dependabot[bot] commented on PR #41996: URL: https://github.com/apache/spark/pull/41996#issuecomment-1635279447 Looks like grpcio is up-to-date now, so this is no longer needed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] dependabot[bot] closed pull request #41996: Bump grpcio from 1.48.1 to 1.53.0 in /dev

2023-07-13 Thread via GitHub
dependabot[bot] closed pull request #41996: Bump grpcio from 1.48.1 to 1.53.0 in /dev URL: https://github.com/apache/spark/pull/41996 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

[GitHub] [spark] HyukjinKwon closed pull request #41997: [SPARK-44222][BUILD][PYTHON] Upgrade grpc to 1.56.0 with lower/upperbound

2023-07-13 Thread via GitHub
HyukjinKwon closed pull request #41997: [SPARK-44222][BUILD][PYTHON] Upgrade grpc to 1.56.0 with lower/upperbound URL: https://github.com/apache/spark/pull/41997 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

[GitHub] [spark] HyukjinKwon commented on pull request #41997: [SPARK-44222][BUILD][PYTHON] Upgrade grpc to 1.56.0 with lower/upperbound

2023-07-13 Thread via GitHub
HyukjinKwon commented on PR #41997: URL: https://github.com/apache/spark/pull/41997#issuecomment-1635278343 Seems like this actually affects our build :-). Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[GitHub] [spark] beliefer commented on pull request #41939: [SPARK-44341][SQL][PYTHON] Define the computing logic through PartitionEvaluator API and use it in WindowExec and WindowInPandasExec

2023-07-13 Thread via GitHub
beliefer commented on PR #41939: URL: https://github.com/apache/spark/pull/41939#issuecomment-1635277965 > As tests are skipped now, maybe you can enable it to test change in CI, and revert back current config before merging. In fact, CI passed in previous commit. -- This is an aut

[GitHub] [spark] viirya commented on pull request #41939: [SPARK-44341][SQL][PYTHON] Define the computing logic through PartitionEvaluator API and use it in WindowExec and WindowInPandasExec

2023-07-13 Thread via GitHub
viirya commented on PR #41939: URL: https://github.com/apache/spark/pull/41939#issuecomment-1635277333 As tests are skipped now, maybe you can enable it to test change in CI, and revert back current config before merging. -- This is an automated message from the Apache Git Service. To res

[GitHub] [spark] beliefer commented on a diff in pull request #41939: [SPARK-44341][SQL][PYTHON] Define the computing logic through PartitionEvaluator API and use it in WindowExec and WindowInPandasEx

2023-07-13 Thread via GitHub
beliefer commented on code in PR #41939: URL: https://github.com/apache/spark/pull/41939#discussion_r1263293187 ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowEvaluatorFactory.scala: ## @@ -0,0 +1,418 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] vinodkc opened a new pull request, #41998: [SPARK-44411][SQL] Use PartitionEvaluator API in ArrowEvalPythonExec and BatchEvalPythonExec

2023-07-13 Thread via GitHub
vinodkc opened a new pull request, #41998: URL: https://github.com/apache/spark/pull/41998 ### What changes were proposed in this pull request? SQL operators `ArrowEvalPythonExec` & `BatchEvalPythonExec` are updated to use the `PartitionEvaluator` API to do execution. #

[GitHub] [spark] viirya commented on a diff in pull request #41939: [SPARK-44341][SQL][PYTHON] Define the computing logic through PartitionEvaluator API and use it in WindowExec and WindowInPandasExec

2023-07-13 Thread via GitHub
viirya commented on code in PR #41939: URL: https://github.com/apache/spark/pull/41939#discussion_r1263289294 ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowEvaluatorFactory.scala: ## @@ -0,0 +1,418 @@ +/* + * Licensed to the Apache Software Foundation (A

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41991: [SPARK-44413][PYTHON] Clarify error for unsupported arg data type in assertDataFrameEqual

2023-07-13 Thread via GitHub
HyukjinKwon commented on code in PR #41991: URL: https://github.com/apache/spark/pull/41991#discussion_r1263287529 ## python/pyspark/sql/tests/test_utils.py: ## @@ -25,6 +25,7 @@ ) from pyspark.testing.utils import assertDataFrameEqual from pyspark.testing.sqlutils import Reu

[GitHub] [spark] HyukjinKwon commented on pull request #41767: [SPARK-44222][BUILD][PYTHON] Upgrade `grpc` to 1.56.0

2023-07-13 Thread via GitHub
HyukjinKwon commented on PR #41767: URL: https://github.com/apache/spark/pull/41767#issuecomment-1635267805 Actually we can just set the range .. it seems ... I opened a PR to bring this back. -- This is an automated message from the Apache Git Service. To respond to the message, please l

[GitHub] [spark] HyukjinKwon opened a new pull request, #41997: [SPARK-44222][BUILD][PYTHON] Upgrade grpc to 1.56.0 with lower/upperbound

2023-07-13 Thread via GitHub
HyukjinKwon opened a new pull request, #41997: URL: https://github.com/apache/spark/pull/41997 ### What changes were proposed in this pull request? This PR revert the revert of https://github.com/apache/spark/pull/41767 with setting grpc lowerbounds. ### Why are the changes nee

[GitHub] [spark] caican00 commented on a diff in pull request #41993: [SPARK-44414][SQL] Fixed matching check for CharType/VarcharType

2023-07-13 Thread via GitHub
caican00 commented on code in PR #41993: URL: https://github.com/apache/spark/pull/41993#discussion_r1263279866 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/V2WriteAnalysisSuite.scala: ## @@ -744,6 +744,46 @@ abstract class V2WriteAnalysisSuiteBase exten

[GitHub] [spark] dependabot[bot] opened a new pull request, #41996: Bump grpcio from 1.48.1 to 1.53.0 in /dev

2023-07-13 Thread via GitHub
dependabot[bot] opened a new pull request, #41996: URL: https://github.com/apache/spark/pull/41996 Bumps [grpcio](https://github.com/grpc/grpc) from 1.48.1 to 1.53.0. Release notes Sourced from https://github.com/grpc/grpc/releases";>grpcio's releases. Release v1.53.0 Thi

[GitHub] [spark] wangyum commented on pull request #41630: [SPARK-44080][SQL] Update Spark SQL config default value for thriftserver

2023-07-13 Thread via GitHub
wangyum commented on PR #41630: URL: https://github.com/apache/spark/pull/41630#issuecomment-1635253045 @dongjoon-hyun Yes. It's admin user level features. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] HyukjinKwon commented on pull request #41767: [SPARK-44222][BUILD][PYTHON] Upgrade `grpc` to 1.56.0

2023-07-13 Thread via GitHub
HyukjinKwon commented on PR #41767: URL: https://github.com/apache/spark/pull/41767#issuecomment-1635251662 Let me revert this out for now (since the branchcut is very soon) and take another look for the upgrade together if you don't mind. Sorry for a bit rushing on this. -- This is an a

[GitHub] [spark] HyukjinKwon commented on pull request #41767: [SPARK-44222][BUILD][PYTHON] Upgrade `grpc` to 1.56.0

2023-07-13 Thread via GitHub
HyukjinKwon commented on PR #41767: URL: https://github.com/apache/spark/pull/41767#issuecomment-1635249615 Seems like: ``` python3.8 -m pip install 'numpy>=1.20.0' pyarrow pandas scipy unittest-xml-reporting 'grpcio==1.56.0' 'protobuf==3.19.5' ``` this combination only w

[GitHub] [spark] HyukjinKwon commented on pull request #41767: [SPARK-44222][BUILD][PYTHON] Upgrade `grpc` to 1.56.0

2023-07-13 Thread via GitHub
HyukjinKwon commented on PR #41767: URL: https://github.com/apache/spark/pull/41767#issuecomment-1635248255 Hmmm .. seems like this actually brings some problem to Spark Connect dev. e.g.,: ``` The conflict is caused by: The user requested protobuf==3.19.5 grpcio-stat

[GitHub] [spark] yaooqinn commented on pull request #41993: [SPARK-44414][SQL] Fixed matching check for CharType/VarcharType

2023-07-13 Thread via GitHub
yaooqinn commented on PR #41993: URL: https://github.com/apache/spark/pull/41993#issuecomment-1635241942 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

[GitHub] [spark] yaooqinn commented on a diff in pull request #41993: [SPARK-44414][SQL] Fixed matching check for CharType/VarcharType

2023-07-13 Thread via GitHub
yaooqinn commented on code in PR #41993: URL: https://github.com/apache/spark/pull/41993#discussion_r1263271281 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/V2WriteAnalysisSuite.scala: ## @@ -744,6 +744,46 @@ abstract class V2WriteAnalysisSuiteBase exten

[GitHub] [spark] zhengruifeng commented on pull request #41995: [SPARK-44416][CONNECT][BUILD] Upgrade buf to v1.24.0

2023-07-13 Thread via GitHub
zhengruifeng commented on PR #41995: URL: https://github.com/apache/spark/pull/41995#issuecomment-1635227386 cc @panbingkun @LuciferYang @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] zhengruifeng opened a new pull request, #41995: [SPARK-44416][CONNECT][BUILD] Upgrade buf to v1.24.0

2023-07-13 Thread via GitHub
zhengruifeng opened a new pull request, #41995: URL: https://github.com/apache/spark/pull/41995 ### What changes were proposed in this pull request? Upgrade buf to v1.24.0 ### Why are the changes needed? we just upgrade it to `1.23.1` two days ago, but since branch cut is com

[GitHub] [spark] wangyum opened a new pull request, #41994: [SPARK-44415][BUILD] Upgrade snappy-java to 1.1.10.2

2023-07-13 Thread via GitHub
wangyum opened a new pull request, #41994: URL: https://github.com/apache/spark/pull/41994 ### What changes were proposed in this pull request? This PR upgrades snappy-java to 1.1.10.2. snappy-java 1.1.10.2 includes the following changes: https://github.com/xerial/snappy-java/r

[GitHub] [spark] xinrong-meng commented on pull request #39952: [SPARK-40770][PYTHON][FOLLOW-UP] Improved error messages for mapInPandas for schema mismatch

2023-07-13 Thread via GitHub
xinrong-meng commented on PR #39952: URL: https://github.com/apache/spark/pull/39952#issuecomment-1635206417 The last commit seems to fail the tests. Would you fix it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

[GitHub] [spark] HyukjinKwon commented on pull request #41987: [SPARK-44410][PYTHON][Connect] Set active session in create, not just getOrCreate

2023-07-13 Thread via GitHub
HyukjinKwon commented on PR #41987: URL: https://github.com/apache/spark/pull/41987#issuecomment-1635200118 cc @grundprinzip and @hvanhovell -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [spark] zhengruifeng commented on pull request #41987: [SPARK-44410][PYTHON][Connect] Set active session in create, not just getOrCreate

2023-07-13 Thread via GitHub
zhengruifeng commented on PR #41987: URL: https://github.com/apache/spark/pull/41987#issuecomment-1635197616 cc @HyukjinKwon @allisonport-db -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [spark] caican00 opened a new pull request, #41993: [SPARK-44414][SQL] Fixed matching check for CharType/VarcharType

2023-07-13 Thread via GitHub
caican00 opened a new pull request, #41993: URL: https://github.com/apache/spark/pull/41993 ### What changes were proposed in this pull request? The input/output Attribute is preferred for validation using the `__CHAR_VARCHAR_TYPE_STRING` type specified in its Metadata. If not specifie

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #41946: [SPARK-44264][PYTHON][ML] FunctionPickler Class

2023-07-13 Thread via GitHub
WeichenXu123 commented on code in PR #41946: URL: https://github.com/apache/spark/pull/41946#discussion_r1263233132 ## python/pyspark/ml/dl_util.py: ## @@ -0,0 +1,142 @@ +import os +import tempfile +import textwrap +from typing import Any, Callable + +from pyspark import cloudpi

[GitHub] [spark] yaooqinn opened a new pull request, #41992: [SPARK-44409][SQL] Handle char/varchar in Dataset.to to keep consistent with others

2023-07-13 Thread via GitHub
yaooqinn opened a new pull request, #41992: URL: https://github.com/apache/spark/pull/41992 ### What changes were proposed in this pull request? This PR replaces user-specified char/varchar in dataset.to API to make it consistent with other dataset/dataframe APIs ##

[GitHub] [spark] Hisoka-X commented on a diff in pull request #41347: [SPARK-43838][SQL] Fix subquery on single table with having clause can't be optimized

2023-07-13 Thread via GitHub
Hisoka-X commented on code in PR #41347: URL: https://github.com/apache/spark/pull/41347#discussion_r1263227170 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DeduplicateRelations.scala: ## @@ -117,37 +311,49 @@ object DeduplicateRelations extends Rule[Log

[GitHub] [spark] panbingkun commented on pull request #41984: [MINOR] Removing redundant parentheses from SQL function docs

2023-07-13 Thread via GitHub
panbingkun commented on PR #41984: URL: https://github.com/apache/spark/pull/41984#issuecomment-1635177811 @HyukjinKwon Are the above two questions worth doing? To summarize: 1. Make [FunctionRegistry](https://github.com/apache/spark/blob/8bb07388ea664303d0d22b03cca11a46498b772d/sql/c

[GitHub] [spark] panbingkun commented on a diff in pull request #40506: [SPARK-42881][SQL] Codegen Support for get_json_object

2023-07-13 Thread via GitHub
panbingkun commented on code in PR #40506: URL: https://github.com/apache/spark/pull/40506#discussion_r1263225024 ## sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala: ## @@ -1392,4 +1393,25 @@ class JsonFunctionsSuite extends QueryTest with SharedSparkSessi

[GitHub] [spark] cloud-fan commented on pull request #41939: [SPARK-44341][SQL][PYTHON] Define the computing logic through PartitionEvaluator API and use it in WindowExec and WindowInPandasExec

2023-07-13 Thread via GitHub
cloud-fan commented on PR #41939: URL: https://github.com/apache/spark/pull/41939#issuecomment-1635170579 We can probably skip testing it. Overall it's just a refactor and it's probably too much to run many tests twice. We can enable it by default later so that all tests cover this code pat

[GitHub] [spark] cloud-fan commented on a diff in pull request #41939: [SPARK-44341][SQL][PYTHON] Define the computing logic through PartitionEvaluator API and use it in WindowExec and WindowInPandasE

2023-07-13 Thread via GitHub
cloud-fan commented on code in PR #41939: URL: https://github.com/apache/spark/pull/41939#discussion_r1263219232 ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowEvaluatorFactory.scala: ## @@ -0,0 +1,417 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] cloud-fan commented on a diff in pull request #41932: [SPARK-44131][SQL][FOLLOWUP] Support qualified function name for call_function

2023-07-13 Thread via GitHub
cloud-fan commented on code in PR #41932: URL: https://github.com/apache/spark/pull/41932#discussion_r1263218083 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDFSuite.scala: ## @@ -552,6 +552,14 @@ class HiveUDFSuite extends QueryTest with TestHiveSingleto

[GitHub] [spark] cloud-fan commented on a diff in pull request #41932: [SPARK-44131][SQL][FOLLOWUP] Support qualified function name for call_function

2023-07-13 Thread via GitHub
cloud-fan commented on code in PR #41932: URL: https://github.com/apache/spark/pull/41932#discussion_r1263217835 ## sql/core/src/main/scala/org/apache/spark/sql/functions.scala: ## @@ -8367,8 +8367,17 @@ object functions { * @since 3.5.0 */ @scala.annotation.varargs -

[GitHub] [spark] yaooqinn commented on a diff in pull request #37011: [SPARK-39625][SPARK-38904][SQL] Add Dataset.as(StructType)

2023-07-13 Thread via GitHub
yaooqinn commented on code in PR #37011: URL: https://github.com/apache/spark/pull/37011#discussion_r1263216931 ## sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -464,6 +464,29 @@ class Dataset[T] private[sql]( */ def as[U : Encoder]: Dataset[U] = Datas

[GitHub] [spark] Hisoka-X commented on a diff in pull request #41949: [SPARK-44375][SQL] Use PartitionEvaluator API in DebugExec

2023-07-13 Thread via GitHub
Hisoka-X commented on code in PR #41949: URL: https://github.com/apache/spark/pull/41949#discussion_r1263213638 ## sql/core/src/test/scala/org/apache/spark/sql/execution/debug/DebuggingSuite.scala: ## @@ -27,19 +27,28 @@ import org.apache.spark.sql.catalyst.types.DataTypeUtils.

[GitHub] [spark] asl3 opened a new pull request, #41991: [SPARK-44413] Clarify error for unsupported arg data type in assertDataFrameEqual

2023-07-13 Thread via GitHub
asl3 opened a new pull request, #41991: URL: https://github.com/apache/spark/pull/41991 ### What changes were proposed in this pull request? This PR adds an error class, `INVALID_TYPE_DF_EQUALITY_ARG`, to clarify the error message for unsupported argument data types when calling `assertD

[GitHub] [spark] viirya commented on a diff in pull request #40506: [SPARK-42881][SQL] Codegen Support for get_json_object

2023-07-13 Thread via GitHub
viirya commented on code in PR #40506: URL: https://github.com/apache/spark/pull/40506#discussion_r1263209639 ## sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala: ## @@ -1392,4 +1393,25 @@ class JsonFunctionsSuite extends QueryTest with SharedSparkSession {

[GitHub] [spark] HyukjinKwon closed pull request #41953: [SPARK-43995][SPARK-43996][CONNECT] Add support for UDFRegistration to the Connect Scala Client

2023-07-13 Thread via GitHub
HyukjinKwon closed pull request #41953: [SPARK-43995][SPARK-43996][CONNECT] Add support for UDFRegistration to the Connect Scala Client URL: https://github.com/apache/spark/pull/41953 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[GitHub] [spark] HyukjinKwon commented on pull request #41953: [SPARK-43995][SPARK-43996][CONNECT] Add support for UDFRegistration to the Connect Scala Client

2023-07-13 Thread via GitHub
HyukjinKwon commented on PR #41953: URL: https://github.com/apache/spark/pull/41953#issuecomment-1635146300 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] viirya commented on a diff in pull request #40506: [SPARK-42881][SQL] Codegen Support for get_json_object

2023-07-13 Thread via GitHub
viirya commented on code in PR #40506: URL: https://github.com/apache/spark/pull/40506#discussion_r1263204348 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala: ## @@ -140,18 +135,114 @@ case class GetJsonObject(json: Expression, path

[GitHub] [spark] viirya commented on a diff in pull request #40506: [SPARK-42881][SQL] Codegen Support for get_json_object

2023-07-13 Thread via GitHub
viirya commented on code in PR #40506: URL: https://github.com/apache/spark/pull/40506#discussion_r1263203372 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala: ## @@ -140,18 +135,114 @@ case class GetJsonObject(json: Expression, path

[GitHub] [spark] zhengruifeng commented on a diff in pull request #41947: [SPARK-44217][PYTHON] Allow custom precision for fp approx equality

2023-07-13 Thread via GitHub
zhengruifeng commented on code in PR #41947: URL: https://github.com/apache/spark/pull/41947#discussion_r1263202873 ## python/pyspark/testing/utils.py: ## @@ -336,7 +353,7 @@ def compare_vals(val1, val2): and all(compare_vals(val1[k], val2[k]) for k in val1

[GitHub] [spark] cloud-fan commented on a diff in pull request #41949: [SPARK-44375][SQL] Use PartitionEvaluator API in DebugExec

2023-07-13 Thread via GitHub
cloud-fan commented on code in PR #41949: URL: https://github.com/apache/spark/pull/41949#discussion_r1263191119 ## sql/core/src/test/scala/org/apache/spark/sql/execution/debug/DebuggingSuite.scala: ## @@ -27,19 +27,28 @@ import org.apache.spark.sql.catalyst.types.DataTypeUtils

[GitHub] [spark] mathewjacob1002 commented on a diff in pull request #41946: [SPARK-44264][PYTHON][ML] FunctionPickler Class

2023-07-13 Thread via GitHub
mathewjacob1002 commented on code in PR #41946: URL: https://github.com/apache/spark/pull/41946#discussion_r1263188232 ## python/pyspark/ml/dl_util.py: ## @@ -0,0 +1,142 @@ +import os +import tempfile +import textwrap +from typing import Any, Callable + +from pyspark import clou

[GitHub] [spark] zhengruifeng commented on a diff in pull request #41989: [SPARK-43965][PYTHON][CONNECT] Support Python UDTF in Spark Connect

2023-07-13 Thread via GitHub
zhengruifeng commented on code in PR #41989: URL: https://github.com/apache/spark/pull/41989#discussion_r1263186508 ## python/pyspark/sql/connect/udtf.py: ## @@ -0,0 +1,206 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreeme

[GitHub] [spark] zhengruifeng commented on a diff in pull request #41947: [SPARK-44217][PYTHON] Allow custom precision for fp approx equality

2023-07-13 Thread via GitHub
zhengruifeng commented on code in PR #41947: URL: https://github.com/apache/spark/pull/41947#discussion_r1263172651 ## python/pyspark/testing/utils.py: ## @@ -336,7 +353,7 @@ def compare_vals(val1, val2): and all(compare_vals(val1[k], val2[k]) for k in val1

[GitHub] [spark] zhengruifeng commented on a diff in pull request #41989: [SPARK-43965][PYTHON][CONNECT] Support Python UDTF in Spark Connect

2023-07-13 Thread via GitHub
zhengruifeng commented on code in PR #41989: URL: https://github.com/apache/spark/pull/41989#discussion_r1263186508 ## python/pyspark/sql/connect/udtf.py: ## @@ -0,0 +1,206 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreeme

[GitHub] [spark] mathewjacob1002 commented on a diff in pull request #41946: [SPARK-44264][PYTHON][ML] FunctionPickler Class

2023-07-13 Thread via GitHub
mathewjacob1002 commented on code in PR #41946: URL: https://github.com/apache/spark/pull/41946#discussion_r1263184268 ## python/pyspark/ml/dl_util.py: ## @@ -0,0 +1,142 @@ +import os +import tempfile +import textwrap +from typing import Any, Callable + +from pyspark import clou

[GitHub] [spark] mathewjacob1002 commented on a diff in pull request #41946: [SPARK-44264][PYTHON][ML] FunctionPickler Class

2023-07-13 Thread via GitHub
mathewjacob1002 commented on code in PR #41946: URL: https://github.com/apache/spark/pull/41946#discussion_r1263184328 ## python/pyspark/ml/dl_util.py: ## @@ -0,0 +1,142 @@ +import os +import tempfile +import textwrap +from typing import Any, Callable + +from pyspark import clou

[GitHub] [spark] mathewjacob1002 commented on a diff in pull request #41946: [SPARK-44264][PYTHON][ML] FunctionPickler Class

2023-07-13 Thread via GitHub
mathewjacob1002 commented on code in PR #41946: URL: https://github.com/apache/spark/pull/41946#discussion_r1263184268 ## python/pyspark/ml/dl_util.py: ## @@ -0,0 +1,142 @@ +import os +import tempfile +import textwrap +from typing import Any, Callable + +from pyspark import clou

[GitHub] [spark] mathewjacob1002 commented on a diff in pull request #41946: [SPARK-44264][PYTHON][ML] FunctionPickler Class

2023-07-13 Thread via GitHub
mathewjacob1002 commented on code in PR #41946: URL: https://github.com/apache/spark/pull/41946#discussion_r1263183770 ## python/pyspark/ml/dl_util.py: ## @@ -0,0 +1,142 @@ +import os +import tempfile +import textwrap +from typing import Any, Callable + +from pyspark import clou

[GitHub] [spark] cloud-fan closed pull request #41725: [SPARK-44180][SQL] DistributionAndOrderingUtils should apply ResolveTimeZone

2023-07-13 Thread via GitHub
cloud-fan closed pull request #41725: [SPARK-44180][SQL] DistributionAndOrderingUtils should apply ResolveTimeZone URL: https://github.com/apache/spark/pull/41725 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

[GitHub] [spark] cloud-fan commented on pull request #41725: [SPARK-44180][SQL] DistributionAndOrderingUtils should apply ResolveTimeZone

2023-07-13 Thread via GitHub
cloud-fan commented on PR #41725: URL: https://github.com/apache/spark/pull/41725#issuecomment-1635123540 thanks, merging to master/3.4! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[GitHub] [spark] jiaoqingbo commented on pull request #41983: [SPARK-44203][SQL][HIVE] Return nextRenewalDate instead of None for obtainDelegationTokens method

2023-07-13 Thread via GitHub
jiaoqingbo commented on PR #41983: URL: https://github.com/apache/spark/pull/41983#issuecomment-1635122270 cc @yaooqinn @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

[GitHub] [spark] zhengruifeng commented on a diff in pull request #41989: [SPARK-43965][PYTHON][CONNECT] Support Python UDTF in Spark Connect

2023-07-13 Thread via GitHub
zhengruifeng commented on code in PR #41989: URL: https://github.com/apache/spark/pull/41989#discussion_r1263180196 ## python/pyspark/sql/connect/udtf.py: ## @@ -0,0 +1,206 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreeme

[GitHub] [spark] mathewjacob1002 commented on a diff in pull request #41946: [SPARK-44264][PYTHON][ML] FunctionPickler Class

2023-07-13 Thread via GitHub
mathewjacob1002 commented on code in PR #41946: URL: https://github.com/apache/spark/pull/41946#discussion_r1263180140 ## python/pyspark/ml/dl_util.py: ## @@ -0,0 +1,142 @@ +import os +import tempfile +import textwrap +from typing import Any, Callable + +from pyspark import clou

[GitHub] [spark] zhengruifeng commented on a diff in pull request #41947: [SPARK-44217][PYTHON] Allow custom precision for fp approx equality

2023-07-13 Thread via GitHub
zhengruifeng commented on code in PR #41947: URL: https://github.com/apache/spark/pull/41947#discussion_r1263172651 ## python/pyspark/testing/utils.py: ## @@ -336,7 +353,7 @@ def compare_vals(val1, val2): and all(compare_vals(val1[k], val2[k]) for k in val1

[GitHub] [spark] cloud-fan commented on a diff in pull request #41347: [SPARK-43838][SQL] Fix subquery on single table with having clause can't be optimized

2023-07-13 Thread via GitHub
cloud-fan commented on code in PR #41347: URL: https://github.com/apache/spark/pull/41347#discussion_r1263172474 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DeduplicateRelations.scala: ## @@ -117,37 +311,49 @@ object DeduplicateRelations extends Rule[Lo

[GitHub] [spark] allisonwang-db commented on pull request #41989: [SPARK-43965][PYTHON][CONNECT] Support Python UDTF in Spark Connect

2023-07-13 Thread via GitHub
allisonwang-db commented on PR #41989: URL: https://github.com/apache/spark/pull/41989#issuecomment-1635104349 cc @HyukjinKwon @zhengruifeng @xinrong-meng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[GitHub] [spark] allisonwang-db commented on a diff in pull request #41989: [SPARK-43965][PYTHON][CONNECT] Support Python UDTF in Spark Connect

2023-07-13 Thread via GitHub
allisonwang-db commented on code in PR #41989: URL: https://github.com/apache/spark/pull/41989#discussion_r1263171467 ## python/pyspark/sql/connect/udtf.py: ## @@ -0,0 +1,206 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agree

[GitHub] [spark] ueshin commented on a diff in pull request #41948: [SPARK-44380][SQL][PYTHON] Support for Python UDTF to analyze in Python

2023-07-13 Thread via GitHub
ueshin commented on code in PR #41948: URL: https://github.com/apache/spark/pull/41948#discussion_r1263165177 ## sql/core/src/main/scala/org/apache/spark/sql/execution/python/UserDefinedPythonFunction.scala: ## @@ -91,3 +122,104 @@ case class UserDefinedPythonTableFunction(

[GitHub] [spark] ueshin commented on a diff in pull request #41948: [SPARK-44380][SQL][PYTHON] Support for Python UDTF to analyze in Python

2023-07-13 Thread via GitHub
ueshin commented on code in PR #41948: URL: https://github.com/apache/spark/pull/41948#discussion_r1263165177 ## sql/core/src/main/scala/org/apache/spark/sql/execution/python/UserDefinedPythonFunction.scala: ## @@ -91,3 +122,104 @@ case class UserDefinedPythonTableFunction(

[GitHub] [spark] ueshin commented on a diff in pull request #41948: [SPARK-44380][SQL][PYTHON] Support for Python UDTF to analyze in Python

2023-07-13 Thread via GitHub
ueshin commented on code in PR #41948: URL: https://github.com/apache/spark/pull/41948#discussion_r1263166608 ## python/pyspark/sql/tests/test_udtf.py: ## @@ -719,6 +726,153 @@ def terminate(self): self.assertIn("Evaluate the input row", cls.eval.__doc__) self.

[GitHub] [spark] ueshin commented on a diff in pull request #41948: [SPARK-44380][SQL][PYTHON] Support for Python UDTF to analyze in Python

2023-07-13 Thread via GitHub
ueshin commented on code in PR #41948: URL: https://github.com/apache/spark/pull/41948#discussion_r1263165949 ## python/pyspark/sql/tests/test_udtf.py: ## @@ -719,6 +726,153 @@ def terminate(self): self.assertIn("Evaluate the input row", cls.eval.__doc__) self.

[GitHub] [spark] ueshin commented on a diff in pull request #41948: [SPARK-44380][SQL][PYTHON] Support for Python UDTF to analyze in Python

2023-07-13 Thread via GitHub
ueshin commented on code in PR #41948: URL: https://github.com/apache/spark/pull/41948#discussion_r1263165512 ## sql/core/src/main/scala/org/apache/spark/sql/execution/python/UserDefinedPythonFunction.scala: ## @@ -91,3 +122,104 @@ case class UserDefinedPythonTableFunction(

[GitHub] [spark] ueshin commented on a diff in pull request #41948: [SPARK-44380][SQL][PYTHON] Support for Python UDTF to analyze in Python

2023-07-13 Thread via GitHub
ueshin commented on code in PR #41948: URL: https://github.com/apache/spark/pull/41948#discussion_r1263165177 ## sql/core/src/main/scala/org/apache/spark/sql/execution/python/UserDefinedPythonFunction.scala: ## @@ -91,3 +122,104 @@ case class UserDefinedPythonTableFunction(

[GitHub] [spark] ueshin commented on a diff in pull request #41948: [SPARK-44380][SQL][PYTHON] Support for Python UDTF to analyze in Python

2023-07-13 Thread via GitHub
ueshin commented on code in PR #41948: URL: https://github.com/apache/spark/pull/41948#discussion_r1263165177 ## sql/core/src/main/scala/org/apache/spark/sql/execution/python/UserDefinedPythonFunction.scala: ## @@ -91,3 +122,104 @@ case class UserDefinedPythonTableFunction(

[GitHub] [spark] HyukjinKwon closed pull request #41752: [SPARK-44201][CONNECT][SS]Add support for Streaming Listener in Scala for Spark Connect

2023-07-13 Thread via GitHub
HyukjinKwon closed pull request #41752: [SPARK-44201][CONNECT][SS]Add support for Streaming Listener in Scala for Spark Connect URL: https://github.com/apache/spark/pull/41752 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

[GitHub] [spark] HyukjinKwon commented on pull request #41752: [SPARK-44201][CONNECT][SS]Add support for Streaming Listener in Scala for Spark Connect

2023-07-13 Thread via GitHub
HyukjinKwon commented on PR #41752: URL: https://github.com/apache/spark/pull/41752#issuecomment-1635091793 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] zhengruifeng commented on pull request #41986: [SPARK-44406][CONNECT] Make `SparkSession.sql` work properly with dropped temp view

2023-07-13 Thread via GitHub
zhengruifeng commented on PR #41986: URL: https://github.com/apache/spark/pull/41986#issuecomment-1635089887 > Branchcut is soon so let's revert #41980 first. ok, let's revert that one, since it creates temp view if `kwargs` contains DF variables. I can not find an alternative solutio

[GitHub] [spark] HyukjinKwon commented on pull request #41986: [SPARK-44406][CONNECT] Make `SparkSession.sql` work properly with dropped temp view

2023-07-13 Thread via GitHub
HyukjinKwon commented on PR #41986: URL: https://github.com/apache/spark/pull/41986#issuecomment-1635087027 Branchcut is soon so let's revert https://github.com/apache/spark/pull/41980 first. -- This is an automated message from the Apache Git Service. To respond to the message, please lo

[GitHub] [spark] HyukjinKwon closed pull request #41986: [SPARK-44406][CONNECT] Make `SparkSession.sql` work properly with dropped temp view

2023-07-13 Thread via GitHub
HyukjinKwon closed pull request #41986: [SPARK-44406][CONNECT] Make `SparkSession.sql` work properly with dropped temp view URL: https://github.com/apache/spark/pull/41986 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #41946: [SPARK-44264][PYTHON][ML] FunctionPickler Class

2023-07-13 Thread via GitHub
WeichenXu123 commented on code in PR #41946: URL: https://github.com/apache/spark/pull/41946#discussion_r1263157508 ## python/pyspark/ml/dl_util.py: ## @@ -0,0 +1,142 @@ +import os +import tempfile +import textwrap +from typing import Any, Callable + +from pyspark import cloudpi

[GitHub] [spark] github-actions[bot] closed pull request #40567: [SPARK-42935] [SQL] Add union required distribution push down

2023-07-13 Thread via GitHub
github-actions[bot] closed pull request #40567: [SPARK-42935] [SQL] Add union required distribution push down URL: https://github.com/apache/spark/pull/40567 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[GitHub] [spark] github-actions[bot] closed pull request #40477: [SPARK-42805]`DeduplicateRelations` rule show process `LOGICAL_RDD`

2023-07-13 Thread via GitHub
github-actions[bot] closed pull request #40477: [SPARK-42805]`DeduplicateRelations` rule show process `LOGICAL_RDD` URL: https://github.com/apache/spark/pull/40477 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [spark] github-actions[bot] commented on pull request #40667: Improve IDE build experience against jdk11

2023-07-13 Thread via GitHub
github-actions[bot] commented on PR #40667: URL: https://github.com/apache/spark/pull/40667#issuecomment-1635085283 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] closed pull request #40635: [SPARK-42860][SQL] Add analysed logical mode in org.apache.spark.sql.execution.ExplainMode

2023-07-13 Thread via GitHub
github-actions[bot] closed pull request #40635: [SPARK-42860][SQL] Add analysed logical mode in org.apache.spark.sql.execution.ExplainMode URL: https://github.com/apache/spark/pull/40635 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

  1   2   3   >