[GitHub] [spark] gengliangwang opened a new pull request, #40804: [SPARK-43151][DOC] Update the prerequisites for generating Python API docs

2023-04-15 Thread via GitHub
gengliangwang opened a new pull request, #40804: URL: https://github.com/apache/spark/pull/40804 ### What changes were proposed in this pull request? Update the prerequisites for generating Python API docs: * The command should be run under the docs directory so that the inp

[GitHub] [spark] mridulm commented on a diff in pull request #40730: [SPARK-43086][CORE] Support bin pack task scheduling on executors

2023-04-15 Thread via GitHub
mridulm commented on code in PR #40730: URL: https://github.com/apache/spark/pull/40730#discussion_r1166174167 ## core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala: ## @@ -401,17 +403,24 @@ private[spark] class TaskSchedulerImpl( val host = shuffledOf

[GitHub] [spark] ivoson commented on pull request #40610: [SPARK-42626][CONNECT] Add Destructive Iterator for SparkResult

2023-04-15 Thread via GitHub
ivoson commented on PR #40610: URL: https://github.com/apache/spark/pull/40610#issuecomment-1509728135 Latest commits addressed the comments above. cc @hvanhovell @LuciferYang please take a look when you have time. Thanks. -- This is an automated message from the Apache Git Service. To re

[GitHub] [spark] ivoson commented on a diff in pull request #40610: [SPARK-42626][CONNECT] Add Destructive Iterator for SparkResult

2023-04-15 Thread via GitHub
ivoson commented on code in PR #40610: URL: https://github.com/apache/spark/pull/40610#discussion_r1167475549 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/SparkResult.scala: ## @@ -134,24 +134,41 @@ private[sql] class SparkResult[T]( /**

[GitHub] [spark] ivoson commented on a diff in pull request #40610: [SPARK-42626][CONNECT] Add Destructive Iterator for SparkResult

2023-04-15 Thread via GitHub
ivoson commented on code in PR #40610: URL: https://github.com/apache/spark/pull/40610#discussion_r1167475609 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/SparkResult.scala: ## @@ -45,7 +45,7 @@ private[sql] class SparkResult[T]( private[

[GitHub] [spark] wangyum opened a new pull request, #40805: [SPARK-40609][SQL] Unwrap cast in the join condition to unlock bucketed read

2023-04-15 Thread via GitHub
wangyum opened a new pull request, #40805: URL: https://github.com/apache/spark/pull/40805 ### What changes were proposed in this pull request? It will invalidate the bucketed read if add a cast on bucket keys: ```sql set spark.sql.autoBroadcastJoinThreshold=-1; CREATE TABLE t

[GitHub] [spark] wangyum commented on a diff in pull request #38047: [SPARK-40609][SQL] Casts types according to bucket info for Equality expressions

2023-04-15 Thread via GitHub
wangyum commented on code in PR #38047: URL: https://github.com/apache/spark/pull/38047#discussion_r1167555329 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala: ## @@ -751,6 +753,49 @@ abstract class TypeCoercionBase { } } + /**

[GitHub] [spark] sunchao commented on a diff in pull request #39950: [SPARK-42388][SQL] Avoid parquet footer reads twice in vectorized reader

2023-04-15 Thread via GitHub
sunchao commented on code in PR #39950: URL: https://github.com/apache/spark/pull/39950#discussion_r1167616335 ## sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetFooterReader.java: ## @@ -17,23 +17,57 @@ package org.apache.spark.sql.execution.

[GitHub] [spark] WweiL commented on a diff in pull request #40797: [SPARK-43042] [SS] [Connect] Add table() API support for DataStreamReader

2023-04-15 Thread via GitHub
WweiL commented on code in PR #40797: URL: https://github.com/apache/spark/pull/40797#discussion_r1167626517 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -874,6 +874,14 @@ class SparkConnectPlanner(val session:

[GitHub] [spark] amaliujia commented on pull request #40797: [SPARK-43042] [SS] [Connect] Add table() API support for DataStreamReader

2023-04-15 Thread via GitHub
amaliujia commented on PR #40797: URL: https://github.com/apache/spark/pull/40797#issuecomment-1509937727 Why do you need the change in `dev/tox.ini`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] ueshin opened a new pull request, #40806: [SPARK-43153][CONNECT] Skip Spark execution when the dataframe is local

2023-04-15 Thread via GitHub
ueshin opened a new pull request, #40806: URL: https://github.com/apache/spark/pull/40806 ### What changes were proposed in this pull request? Skips Spark execution when the dataframe is local. ### Why are the changes needed? When the built DataFrame in Spark Connect is l

[GitHub] [spark] github-actions[bot] commented on pull request #39187: [SPARK-41670] WIP builtin schema

2023-04-15 Thread via GitHub
github-actions[bot] commented on PR #39187: URL: https://github.com/apache/spark/pull/39187#issuecomment-1510007101 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] commented on pull request #38660: [SPARK-40199][SQL][WIP] Provide useful error when encountering null values in non-null fields

2023-04-15 Thread via GitHub
github-actions[bot] commented on PR #38660: URL: https://github.com/apache/spark/pull/38660#issuecomment-1510007113 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] wangyum opened a new pull request, #40807: [SPARK-43139][SQL][DOCS] Fix incorrect column names in sql-ref-syntax-dml-insert-table.md

2023-04-15 Thread via GitHub
wangyum opened a new pull request, #40807: URL: https://github.com/apache/spark/pull/40807 ### What changes were proposed in this pull request? This PR fixes incorrect column names in [sql-ref-syntax-dml-insert-table.md](https://spark.apache.org/docs/3.4.0/sql-ref-syntax-dml-insert-ta

[GitHub] [spark] wangyum closed pull request #40803: [MINOR][CONNECT][PYTHON] Typo fixes

2023-04-15 Thread via GitHub
wangyum closed pull request #40803: [MINOR][CONNECT][PYTHON] Typo fixes URL: https://github.com/apache/spark/pull/40803 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

[GitHub] [spark] wangyum commented on pull request #40803: [MINOR][CONNECT][PYTHON] Typo fixes

2023-04-15 Thread via GitHub
wangyum commented on PR #40803: URL: https://github.com/apache/spark/pull/40803#issuecomment-1510009044 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] wangyum commented on a diff in pull request #40790: [SPARK-43116][SQL] Fix Cast.forceNullable

2023-04-15 Thread via GitHub
wangyum commented on code in PR #40790: URL: https://github.com/apache/spark/pull/40790#discussion_r1167672641 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala: ## @@ -396,6 +396,22 @@ object Cast extends QueryErrorsBase { case (_, to: D

[GitHub] [spark] yabola commented on a diff in pull request #39950: [SPARK-42388][SQL] Avoid parquet footer reads twice in vectorized reader

2023-04-15 Thread via GitHub
yabola commented on code in PR #39950: URL: https://github.com/apache/spark/pull/39950#discussion_r1167676059 ## sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetFooterReader.java: ## @@ -17,23 +17,57 @@ package org.apache.spark.sql.execution.d

[GitHub] [spark] yabola commented on a diff in pull request #39950: [SPARK-42388][SQL] Avoid parquet footer reads twice in vectorized reader

2023-04-15 Thread via GitHub
yabola commented on code in PR #39950: URL: https://github.com/apache/spark/pull/39950#discussion_r1167676059 ## sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetFooterReader.java: ## @@ -17,23 +17,57 @@ package org.apache.spark.sql.execution.d

[GitHub] [spark] amaliujia commented on a diff in pull request #40804: [SPARK-43151][DOC] Update the prerequisites for generating Python API docs

2023-04-15 Thread via GitHub
amaliujia commented on code in PR #40804: URL: https://github.com/apache/spark/pull/40804#discussion_r1167680125 ## docs/README.md: ## @@ -61,7 +61,7 @@ See also https://issues.apache.org/jira/browse/SPARK-35375. --> Run the following command from $SPARK_HOME: ```sh -$ sudo p

[GitHub] [spark] gengliangwang commented on a diff in pull request #40804: [SPARK-43151][DOC] Update the prerequisites for generating Python API docs

2023-04-15 Thread via GitHub
gengliangwang commented on code in PR #40804: URL: https://github.com/apache/spark/pull/40804#discussion_r1167688843 ## docs/README.md: ## @@ -61,7 +61,7 @@ See also https://issues.apache.org/jira/browse/SPARK-35375. --> Run the following command from $SPARK_HOME: ```sh -$ su

[GitHub] [spark] sunchao closed pull request #39950: [SPARK-42388][SQL] Avoid parquet footer reads twice in vectorized reader

2023-04-15 Thread via GitHub
sunchao closed pull request #39950: [SPARK-42388][SQL] Avoid parquet footer reads twice in vectorized reader URL: https://github.com/apache/spark/pull/39950 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] sunchao commented on pull request #39950: [SPARK-42388][SQL] Avoid parquet footer reads twice in vectorized reader

2023-04-15 Thread via GitHub
sunchao commented on PR #39950: URL: https://github.com/apache/spark/pull/39950#issuecomment-1510042955 Merged to master, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

[GitHub] [spark] amaliujia commented on pull request #40804: [SPARK-43151][DOC] Update the prerequisites for generating Python API docs

2023-04-15 Thread via GitHub
amaliujia commented on PR #40804: URL: https://github.com/apache/spark/pull/40804#issuecomment-1510097573 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

[GitHub] [spark] yabola commented on pull request #39950: [SPARK-42388][SQL] Avoid parquet footer reads twice in vectorized reader

2023-04-15 Thread via GitHub
yabola commented on PR #39950: URL: https://github.com/apache/spark/pull/39950#issuecomment-1510120727 @sunchao Thank you for your detailed review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] wangyum closed pull request #40807: [SPARK-43139][SQL][DOCS] Fix incorrect column names in sql-ref-syntax-dml-insert-table.md

2023-04-15 Thread via GitHub
wangyum closed pull request #40807: [SPARK-43139][SQL][DOCS] Fix incorrect column names in sql-ref-syntax-dml-insert-table.md URL: https://github.com/apache/spark/pull/40807 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use