date:20230415

[GitHub] [spark] gengliangwang opened a new pull request, #40804: [SPARK-43151][DOC] Update the prerequisites for generating Python API docs

2023-04-15 Thread via GitHub

gengliangwang opened a new pull request, #40804: URL: https://github.com/apache/spark/pull/40804 ### What changes were proposed in this pull request? Update the prerequisites for generating Python API docs: * The command should be run under the docs directory so that the inp

[GitHub] [spark] mridulm commented on a diff in pull request #40730: [SPARK-43086][CORE] Support bin pack task scheduling on executors

2023-04-15 Thread via GitHub

mridulm commented on code in PR #40730: URL: https://github.com/apache/spark/pull/40730#discussion_r1166174167 ## core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala: ## @@ -401,17 +403,24 @@ private[spark] class TaskSchedulerImpl( val host = shuffledOf

[GitHub] [spark] ivoson commented on pull request #40610: [SPARK-42626][CONNECT] Add Destructive Iterator for SparkResult

2023-04-15 Thread via GitHub

ivoson commented on PR #40610: URL: https://github.com/apache/spark/pull/40610#issuecomment-1509728135 Latest commits addressed the comments above. cc @hvanhovell @LuciferYang please take a look when you have time. Thanks. -- This is an automated message from the Apache Git Service. To re

[GitHub] [spark] ivoson commented on a diff in pull request #40610: [SPARK-42626][CONNECT] Add Destructive Iterator for SparkResult

2023-04-15 Thread via GitHub

ivoson commented on code in PR #40610: URL: https://github.com/apache/spark/pull/40610#discussion_r1167475549 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/SparkResult.scala: ## @@ -134,24 +134,41 @@ private[sql] class SparkResult[T]( /**

[GitHub] [spark] ivoson commented on a diff in pull request #40610: [SPARK-42626][CONNECT] Add Destructive Iterator for SparkResult

2023-04-15 Thread via GitHub

ivoson commented on code in PR #40610: URL: https://github.com/apache/spark/pull/40610#discussion_r1167475609 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/SparkResult.scala: ## @@ -45,7 +45,7 @@ private[sql] class SparkResult[T]( private[

[GitHub] [spark] wangyum opened a new pull request, #40805: [SPARK-40609][SQL] Unwrap cast in the join condition to unlock bucketed read

2023-04-15 Thread via GitHub

wangyum opened a new pull request, #40805: URL: https://github.com/apache/spark/pull/40805 ### What changes were proposed in this pull request? It will invalidate the bucketed read if add a cast on bucket keys: ```sql set spark.sql.autoBroadcastJoinThreshold=-1; CREATE TABLE t

[GitHub] [spark] wangyum commented on a diff in pull request #38047: [SPARK-40609][SQL] Casts types according to bucket info for Equality expressions

2023-04-15 Thread via GitHub

wangyum commented on code in PR #38047: URL: https://github.com/apache/spark/pull/38047#discussion_r1167555329 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala: ## @@ -751,6 +753,49 @@ abstract class TypeCoercionBase { } } + /**

[GitHub] [spark] sunchao commented on a diff in pull request #39950: [SPARK-42388][SQL] Avoid parquet footer reads twice in vectorized reader

2023-04-15 Thread via GitHub

sunchao commented on code in PR #39950: URL: https://github.com/apache/spark/pull/39950#discussion_r1167616335 ## sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetFooterReader.java: ## @@ -17,23 +17,57 @@ package org.apache.spark.sql.execution.

[GitHub] [spark] WweiL commented on a diff in pull request #40797: [SPARK-43042] [SS] [Connect] Add table() API support for DataStreamReader

2023-04-15 Thread via GitHub

WweiL commented on code in PR #40797: URL: https://github.com/apache/spark/pull/40797#discussion_r1167626517 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -874,6 +874,14 @@ class SparkConnectPlanner(val session:

[GitHub] [spark] amaliujia commented on pull request #40797: [SPARK-43042] [SS] [Connect] Add table() API support for DataStreamReader

2023-04-15 Thread via GitHub

amaliujia commented on PR #40797: URL: https://github.com/apache/spark/pull/40797#issuecomment-1509937727 Why do you need the change in `dev/tox.ini`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] ueshin opened a new pull request, #40806: [SPARK-43153][CONNECT] Skip Spark execution when the dataframe is local

2023-04-15 Thread via GitHub

ueshin opened a new pull request, #40806: URL: https://github.com/apache/spark/pull/40806 ### What changes were proposed in this pull request? Skips Spark execution when the dataframe is local. ### Why are the changes needed? When the built DataFrame in Spark Connect is l

[GitHub] [spark] github-actions[bot] commented on pull request #39187: [SPARK-41670] WIP builtin schema

2023-04-15 Thread via GitHub

github-actions[bot] commented on PR #39187: URL: https://github.com/apache/spark/pull/39187#issuecomment-1510007101 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] commented on pull request #38660: [SPARK-40199][SQL][WIP] Provide useful error when encountering null values in non-null fields

2023-04-15 Thread via GitHub

github-actions[bot] commented on PR #38660: URL: https://github.com/apache/spark/pull/38660#issuecomment-1510007113 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] wangyum opened a new pull request, #40807: [SPARK-43139][SQL][DOCS] Fix incorrect column names in sql-ref-syntax-dml-insert-table.md

2023-04-15 Thread via GitHub

wangyum opened a new pull request, #40807: URL: https://github.com/apache/spark/pull/40807 ### What changes were proposed in this pull request? This PR fixes incorrect column names in [sql-ref-syntax-dml-insert-table.md](https://spark.apache.org/docs/3.4.0/sql-ref-syntax-dml-insert-ta

[GitHub] [spark] wangyum closed pull request #40803: [MINOR][CONNECT][PYTHON] Typo fixes

2023-04-15 Thread via GitHub

wangyum closed pull request #40803: [MINOR][CONNECT][PYTHON] Typo fixes URL: https://github.com/apache/spark/pull/40803 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

[GitHub] [spark] wangyum commented on pull request #40803: [MINOR][CONNECT][PYTHON] Typo fixes

2023-04-15 Thread via GitHub

wangyum commented on PR #40803: URL: https://github.com/apache/spark/pull/40803#issuecomment-1510009044 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] wangyum commented on a diff in pull request #40790: [SPARK-43116][SQL] Fix Cast.forceNullable

2023-04-15 Thread via GitHub

wangyum commented on code in PR #40790: URL: https://github.com/apache/spark/pull/40790#discussion_r1167672641 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala: ## @@ -396,6 +396,22 @@ object Cast extends QueryErrorsBase { case (_, to: D

[GitHub] [spark] yabola commented on a diff in pull request #39950: [SPARK-42388][SQL] Avoid parquet footer reads twice in vectorized reader

2023-04-15 Thread via GitHub

yabola commented on code in PR #39950: URL: https://github.com/apache/spark/pull/39950#discussion_r1167676059 ## sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetFooterReader.java: ## @@ -17,23 +17,57 @@ package org.apache.spark.sql.execution.d

[GitHub] [spark] yabola commented on a diff in pull request #39950: [SPARK-42388][SQL] Avoid parquet footer reads twice in vectorized reader

2023-04-15 Thread via GitHub

yabola commented on code in PR #39950: URL: https://github.com/apache/spark/pull/39950#discussion_r1167676059 ## sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetFooterReader.java: ## @@ -17,23 +17,57 @@ package org.apache.spark.sql.execution.d

[GitHub] [spark] amaliujia commented on a diff in pull request #40804: [SPARK-43151][DOC] Update the prerequisites for generating Python API docs

2023-04-15 Thread via GitHub

amaliujia commented on code in PR #40804: URL: https://github.com/apache/spark/pull/40804#discussion_r1167680125 ## docs/README.md: ## @@ -61,7 +61,7 @@ See also https://issues.apache.org/jira/browse/SPARK-35375. --> Run the following command from $SPARK_HOME: ```sh -$ sudo p

[GitHub] [spark] gengliangwang commented on a diff in pull request #40804: [SPARK-43151][DOC] Update the prerequisites for generating Python API docs

2023-04-15 Thread via GitHub

gengliangwang commented on code in PR #40804: URL: https://github.com/apache/spark/pull/40804#discussion_r1167688843 ## docs/README.md: ## @@ -61,7 +61,7 @@ See also https://issues.apache.org/jira/browse/SPARK-35375. --> Run the following command from $SPARK_HOME: ```sh -$ su

[GitHub] [spark] sunchao closed pull request #39950: [SPARK-42388][SQL] Avoid parquet footer reads twice in vectorized reader

2023-04-15 Thread via GitHub

sunchao closed pull request #39950: [SPARK-42388][SQL] Avoid parquet footer reads twice in vectorized reader URL: https://github.com/apache/spark/pull/39950 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] sunchao commented on pull request #39950: [SPARK-42388][SQL] Avoid parquet footer reads twice in vectorized reader

2023-04-15 Thread via GitHub

sunchao commented on PR #39950: URL: https://github.com/apache/spark/pull/39950#issuecomment-1510042955 Merged to master, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

[GitHub] [spark] amaliujia commented on pull request #40804: [SPARK-43151][DOC] Update the prerequisites for generating Python API docs

2023-04-15 Thread via GitHub

amaliujia commented on PR #40804: URL: https://github.com/apache/spark/pull/40804#issuecomment-1510097573 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

[GitHub] [spark] yabola commented on pull request #39950: [SPARK-42388][SQL] Avoid parquet footer reads twice in vectorized reader

2023-04-15 Thread via GitHub

yabola commented on PR #39950: URL: https://github.com/apache/spark/pull/39950#issuecomment-1510120727 @sunchao Thank you for your detailed review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] wangyum closed pull request #40807: [SPARK-43139][SQL][DOCS] Fix incorrect column names in sql-ref-syntax-dml-insert-table.md

2023-04-15 Thread via GitHub

wangyum closed pull request #40807: [SPARK-43139][SQL][DOCS] Fix incorrect column names in sql-ref-syntax-dml-insert-table.md URL: https://github.com/apache/spark/pull/40807 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] gengliangwang opened a new pull request, #40804: [SPARK-43151][DOC] Update the prerequisites for generating Python API docs

[GitHub] [spark] mridulm commented on a diff in pull request #40730: [SPARK-43086][CORE] Support bin pack task scheduling on executors

[GitHub] [spark] ivoson commented on pull request #40610: [SPARK-42626][CONNECT] Add Destructive Iterator for SparkResult

[GitHub] [spark] ivoson commented on a diff in pull request #40610: [SPARK-42626][CONNECT] Add Destructive Iterator for SparkResult

[GitHub] [spark] ivoson commented on a diff in pull request #40610: [SPARK-42626][CONNECT] Add Destructive Iterator for SparkResult

[GitHub] [spark] wangyum opened a new pull request, #40805: [SPARK-40609][SQL] Unwrap cast in the join condition to unlock bucketed read

[GitHub] [spark] wangyum commented on a diff in pull request #38047: [SPARK-40609][SQL] Casts types according to bucket info for Equality expressions

[GitHub] [spark] sunchao commented on a diff in pull request #39950: [SPARK-42388][SQL] Avoid parquet footer reads twice in vectorized reader

[GitHub] [spark] WweiL commented on a diff in pull request #40797: [SPARK-43042] [SS] [Connect] Add table() API support for DataStreamReader

[GitHub] [spark] amaliujia commented on pull request #40797: [SPARK-43042] [SS] [Connect] Add table() API support for DataStreamReader

[GitHub] [spark] ueshin opened a new pull request, #40806: [SPARK-43153][CONNECT] Skip Spark execution when the dataframe is local

[GitHub] [spark] github-actions[bot] commented on pull request #39187: [SPARK-41670] WIP builtin schema

[GitHub] [spark] github-actions[bot] commented on pull request #38660: [SPARK-40199][SQL][WIP] Provide useful error when encountering null values in non-null fields

[GitHub] [spark] wangyum opened a new pull request, #40807: [SPARK-43139][SQL][DOCS] Fix incorrect column names in sql-ref-syntax-dml-insert-table.md

[GitHub] [spark] wangyum closed pull request #40803: [MINOR][CONNECT][PYTHON] Typo fixes

[GitHub] [spark] wangyum commented on pull request #40803: [MINOR][CONNECT][PYTHON] Typo fixes

[GitHub] [spark] wangyum commented on a diff in pull request #40790: [SPARK-43116][SQL] Fix Cast.forceNullable

[GitHub] [spark] yabola commented on a diff in pull request #39950: [SPARK-42388][SQL] Avoid parquet footer reads twice in vectorized reader

[GitHub] [spark] yabola commented on a diff in pull request #39950: [SPARK-42388][SQL] Avoid parquet footer reads twice in vectorized reader

[GitHub] [spark] amaliujia commented on a diff in pull request #40804: [SPARK-43151][DOC] Update the prerequisites for generating Python API docs

[GitHub] [spark] gengliangwang commented on a diff in pull request #40804: [SPARK-43151][DOC] Update the prerequisites for generating Python API docs

[GitHub] [spark] sunchao closed pull request #39950: [SPARK-42388][SQL] Avoid parquet footer reads twice in vectorized reader

[GitHub] [spark] sunchao commented on pull request #39950: [SPARK-42388][SQL] Avoid parquet footer reads twice in vectorized reader

[GitHub] [spark] amaliujia commented on pull request #40804: [SPARK-43151][DOC] Update the prerequisites for generating Python API docs

[GitHub] [spark] yabola commented on pull request #39950: [SPARK-42388][SQL] Avoid parquet footer reads twice in vectorized reader

[GitHub] [spark] wangyum closed pull request #40807: [SPARK-43139][SQL][DOCS] Fix incorrect column names in sql-ref-syntax-dml-insert-table.md

26 matches

Site Navigation

Mail list logo

Footer information