Re: [PR] [SPARK-45402][SQL][PYTHON] Add UDTF API for 'analyze' to return a buffer to consume on each class creation [spark]

2023-10-09 Thread via GitHub
dtenedor commented on code in PR #43204: URL: https://github.com/apache/spark/pull/43204#discussion_r1350762771 ## python/pyspark/sql/tests/test_udtf.py: ## @@ -2309,6 +2309,55 @@ def terminate(self): + [Row(partition_col=42, count=3, total=3, last=None)],

Re: [PR] [SPARK-45402][SQL][PYTHON] Add UDTF API for 'analyze' to return a buffer to consume on each class creation [spark]

2023-10-09 Thread via GitHub
dtenedor commented on code in PR #43204: URL: https://github.com/apache/spark/pull/43204#discussion_r1348084412 ## python/pyspark/sql/udtf.py: ## @@ -107,12 +107,20 @@ class AnalyzeResult: If non-empty, this is a sequence of columns that the UDTF is specifying for Cata

Re: [PR] [SPARK-45402][SQL][PYTHON] Add UDTF API for 'analyze' to return a buffer to consume on each class creation [spark]

2023-10-09 Thread via GitHub
dtenedor commented on PR #43204: URL: https://github.com/apache/spark/pull/43204#issuecomment-1753741579 Hi @allisonwang-db @ueshin thanks for your reviews, these were good comments, please look again! I think the new API is better now. -- This is an automated message from the Apache Git

Re: [PR] [SPARK-45402][SQL][PYTHON] Add UDTF API for 'analyze' to return a buffer to consume on each class creation [spark]

2023-10-09 Thread via GitHub
dtenedor commented on code in PR #43204: URL: https://github.com/apache/spark/pull/43204#discussion_r1350763173 ## python/pyspark/worker.py: ## @@ -786,6 +787,24 @@ def _remove_partition_by_exprs(self, arg: Any) -> Any: else: return arg +# Wra

Re: [PR] [SPARK-45402][SQL][PYTHON] Add UDTF API for 'analyze' to return a buffer to consume on each class creation [spark]

2023-10-09 Thread via GitHub
dtenedor commented on code in PR #43204: URL: https://github.com/apache/spark/pull/43204#discussion_r1350763041 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/PythonUDF.scala: ## @@ -167,22 +169,26 @@ abstract class UnevaluableGenerator extends Generato

Re: [PR] [SPARK-45402][SQL][PYTHON] Add UDTF API for 'analyze' to return a buffer to consume on each class creation [spark]

2023-10-05 Thread via GitHub
allisonwang-db commented on code in PR #43204: URL: https://github.com/apache/spark/pull/43204#discussion_r1347981218 ## python/pyspark/sql/tests/test_udtf.py: ## @@ -2309,6 +2309,55 @@ def terminate(self): + [Row(partition_col=42, count=3, total=3, last=None)],

Re: [PR] [SPARK-45402][SQL][PYTHON] Add UDTF API for 'analyze' to return a buffer to consume on each class creation [spark]

2023-10-04 Thread via GitHub
ueshin commented on code in PR #43204: URL: https://github.com/apache/spark/pull/43204#discussion_r1346427159 ## sql/core/src/main/scala/org/apache/spark/sql/execution/python/UserDefinedPythonFunction.scala: ## @@ -290,6 +295,20 @@ object UserDefinedPythonTableFunction {

Re: [PR] [SPARK-45402][SQL][PYTHON] Add UDTF API for 'analyze' to return a buffer to consume on each class creation [spark]

2023-10-03 Thread via GitHub
HyukjinKwon commented on PR #43204: URL: https://github.com/apache/spark/pull/43204#issuecomment-1746074413 Implementation seems fine from a cursory look, but let me defer to @allisonwang-db and @ueshin for the design. -- This is an automated message from the Apache Git Service. To respon

Re: [PR] [SPARK-45402][SQL][Python] Add UDTF API for 'analyze' to return a buffer to consume on each class creation [spark]

2023-10-03 Thread via GitHub
dtenedor commented on PR #43204: URL: https://github.com/apache/spark/pull/43204#issuecomment-1745840699 cc @ueshin @allisonwang-db @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

[PR] [SPARK-45402][SQL][Python] Add UDTF API for 'analyze' to return a buffer to consume on each class creation [spark]

2023-10-03 Thread via GitHub
dtenedor opened a new pull request, #43204: URL: https://github.com/apache/spark/pull/43204 ### What changes were proposed in this pull request? This PR adds a Python UDTF API for 'analyze' to return a buffer to consume on each class creation. * The `AnalyzeResult` class now co