[PR] [SPARK-46253][PYTHON] Plan Python data source read using MapInArrow [spark]

2023-12-04 Thread via GitHub
allisonwang-db opened a new pull request, #44170: URL: https://github.com/apache/spark/pull/44170 ### What changes were proposed in this pull request? This PR changes how we plan Python data source read. Instead of using a regular Python UDTF, we can use an arrow UDF and plan

Re: [PR] [SPARK-46253][PYTHON] Plan Python data source read using MapInArrow [spark]

2023-12-11 Thread via GitHub
allisonwang-db commented on PR #44170: URL: https://github.com/apache/spark/pull/44170#issuecomment-1850006004 cc @ueshin -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] [SPARK-46253][PYTHON] Plan Python data source read using MapInArrow [spark]

2023-12-11 Thread via GitHub
allisonwang-db commented on PR #44170: URL: https://github.com/apache/spark/pull/44170#issuecomment-1851231052 cc @cloud-fan @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] [SPARK-46253][PYTHON] Plan Python data source read using MapInArrow [spark]

2023-12-11 Thread via GitHub
HyukjinKwon commented on code in PR #44170: URL: https://github.com/apache/spark/pull/44170#discussion_r1423409469 ## python/pyspark/sql/worker/plan_data_source_read.py: ## @@ -146,16 +175,102 @@ def main(infile: IO, outfile: IO) -> None: message_parameters={"ty

Re: [PR] [SPARK-46253][PYTHON] Plan Python data source read using MapInArrow [spark]

2023-12-11 Thread via GitHub
HyukjinKwon commented on code in PR #44170: URL: https://github.com/apache/spark/pull/44170#discussion_r1423409926 ## python/pyspark/sql/worker/plan_data_source_read.py: ## @@ -146,16 +175,102 @@ def main(infile: IO, outfile: IO) -> None: message_parameters={"ty

Re: [PR] [SPARK-46253][PYTHON] Plan Python data source read using MapInArrow [spark]

2023-12-12 Thread via GitHub
HyukjinKwon closed pull request #44170: [SPARK-46253][PYTHON] Plan Python data source read using MapInArrow URL: https://github.com/apache/spark/pull/44170 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-46253][PYTHON] Plan Python data source read using MapInArrow [spark]

2023-12-12 Thread via GitHub
HyukjinKwon commented on PR #44170: URL: https://github.com/apache/spark/pull/44170#issuecomment-1852887774 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-46253][PYTHON] Plan Python data source read using MapInArrow [spark]

2023-12-14 Thread via GitHub
zhengruifeng commented on code in PR #44170: URL: https://github.com/apache/spark/pull/44170#discussion_r1427498004 ## python/pyspark/sql/worker/plan_data_source_read.py: ## @@ -146,16 +176,94 @@ def main(infile: IO, outfile: IO) -> None: message_parameters={"ty