[GitHub] [spark] zhengruifeng commented on pull request #42809: [SPARK-45074][PYTHON][CONNECT] `DataFrame.{sort, sortWithinPartitions}` support column ordinals

2023-09-05 Thread via GitHub
zhengruifeng commented on PR #42809: URL: https://github.com/apache/spark/pull/42809#issuecomment-1706063313 @dongjoon-hyun @HyukjinKwon thanks for review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[GitHub] [spark] wangyum commented on pull request #42804: [SPARK-45071][SQL] Optimize the processing speed of `BinaryArithmetic#dataType` when processing multi-column data

2023-09-05 Thread via GitHub
wangyum commented on PR #42804: URL: https://github.com/apache/spark/pull/42804#issuecomment-1706074040 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[GitHub] [spark] cloud-fan commented on a diff in pull request #42797: [SPARK-45068][SQL] Make function output column name consistent in case

2023-09-05 Thread via GitHub
cloud-fan commented on code in PR #42797: URL: https://github.com/apache/spark/pull/42797#discussion_r1315476814 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala: ## @@ -67,7 +67,8 @@ abstract class UnaryMathExpression(val f: Double

[GitHub] [spark] cloud-fan commented on a diff in pull request #42759: [SPARK-45039][SQL] Include full identifier in Storage tab

2023-09-05 Thread via GitHub
cloud-fan commented on code in PR #42759: URL: https://github.com/apache/spark/pull/42759#discussion_r1315479553 ## sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala: ## @@ -117,12 +117,20 @@ class CacheManager extends Logging with AdaptiveSparkPlanHelpe

[GitHub] [spark] panbingkun commented on a diff in pull request #42797: [SPARK-45068][SQL] Make function output column name consistent in case

2023-09-05 Thread via GitHub
panbingkun commented on code in PR #42797: URL: https://github.com/apache/spark/pull/42797#discussion_r1315497780 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala: ## @@ -67,7 +67,8 @@ abstract class UnaryMathExpression(val f: Double

[GitHub] [spark] cloud-fan commented on a diff in pull request #42804: [SPARK-45071][SQL] Optimize the processing speed of `BinaryArithmetic#dataType` when processing multi-column data

2023-09-05 Thread via GitHub
cloud-fan commented on code in PR #42804: URL: https://github.com/apache/spark/pull/42804#discussion_r1315502363 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala: ## @@ -234,11 +240,7 @@ abstract class BinaryArithmetic extends BinaryOpera

[GitHub] [spark] yaooqinn opened a new pull request, #42815: [SPARK-45077][UI] Upgrade dagre-d3.js from 0.4.3 to 0.6.4

2023-09-05 Thread via GitHub
yaooqinn opened a new pull request, #42815: URL: https://github.com/apache/spark/pull/42815 ### What changes were proposed in this pull request? This PR upgrades dagre-d3.js from 0.4.3 to 0.6.4, a.k.a. changes the dist build by https://github.com/andrewor14/dagre-d3 to ht

[GitHub] [spark] MaxGekk opened a new pull request, #42816: [WIP][SPARK-45022][SQL] Provide context for dataset API errors

2023-09-05 Thread via GitHub
MaxGekk opened a new pull request, #42816: URL: https://github.com/apache/spark/pull/42816 ### What changes were proposed in this pull request? This PR captures the dataset APIs used by the user code and the call site in the user code and provides better error messages. E.g. consid

[GitHub] [spark] Hisoka-X commented on a diff in pull request #42810: [SPARK-45075][SQL] Fix alter table with invalid default value will not report error

2023-09-05 Thread via GitHub
Hisoka-X commented on code in PR #42810: URL: https://github.com/apache/spark/pull/42810#discussion_r1315586396 ## sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala: ## @@ -213,7 +213,9 @@ private[sql] object CatalogV2Util { // e

[GitHub] [spark] Hisoka-X commented on a diff in pull request #42810: [SPARK-45075][SQL] Fix alter table with invalid default value will not report error

2023-09-05 Thread via GitHub
Hisoka-X commented on code in PR #42810: URL: https://github.com/apache/spark/pull/42810#discussion_r1315586396 ## sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala: ## @@ -213,7 +213,9 @@ private[sql] object CatalogV2Util { // e

[GitHub] [spark] zhengruifeng commented on pull request #42811: [SPARK-43241][PS][FOLLOWUP] Add migration guide for behavior change

2023-09-05 Thread via GitHub
zhengruifeng commented on PR #42811: URL: https://github.com/apache/spark/pull/42811#issuecomment-1706252804 merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] zhengruifeng closed pull request #42811: [SPARK-43241][PS][FOLLOWUP] Add migration guide for behavior change

2023-09-05 Thread via GitHub
zhengruifeng closed pull request #42811: [SPARK-43241][PS][FOLLOWUP] Add migration guide for behavior change URL: https://github.com/apache/spark/pull/42811 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] whcdjj commented on pull request #28280: [SPARK-31438][CORE][SQL] Support JobCleaned Status in SparkListener

2023-09-05 Thread via GitHub
whcdjj commented on PR #28280: URL: https://github.com/apache/spark/pull/28280#issuecomment-1706282059 Is there any progress on this issue? When I was insert into hive table with speculative enabled, I encountered the same problem. -- This is an automated message from the Apache Git Servi

[GitHub] [spark] MaxGekk commented on a diff in pull request #42801: [SPARK-45070][SQL][DOCS] Describe the binary and datetime formats of `to_char`/`to_varchar`

2023-09-05 Thread via GitHub
MaxGekk commented on code in PR #42801: URL: https://github.com/apache/spark/pull/42801#discussion_r1315744531 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala: ## @@ -4284,12 +4285,22 @@ object functions { * prints '+' for positive value

[GitHub] [spark] panbingkun commented on a diff in pull request #42109: [SPARK-44404][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[1009,1010,1013,1015,1016,1278]

2023-09-05 Thread via GitHub
panbingkun commented on code in PR #42109: URL: https://github.com/apache/spark/pull/42109#discussion_r1315778389 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/v2ResolutionPlans.scala: ## @@ -44,7 +44,7 @@ case class UnresolvedNamespace(multipartIdentifie

[GitHub] [spark] panbingkun commented on a diff in pull request #42109: [SPARK-44404][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[1009,1010,1013,1015,1016,1278]

2023-09-05 Thread via GitHub
panbingkun commented on code in PR #42109: URL: https://github.com/apache/spark/pull/42109#discussion_r1315780224 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -1134,20 +1134,21 @@ class Analyzer(override val catalogManager: Catalog

[GitHub] [spark] MaxGekk opened a new pull request, #42817: [SPARK-45079][SQL] Fix an internal error from `percentile_approx()`on `NULL` accuracy

2023-09-05 Thread via GitHub
MaxGekk opened a new pull request, #42817: URL: https://github.com/apache/spark/pull/42817 ### What changes were proposed in this pull request? In the PR, I propose to check the `accuracy` argument is not a NULL in `ApproximatePercentile`. And if it is, throw an `AnalysisException` with n

[GitHub] [spark] zzzzming95 commented on pull request #42804: [SPARK-45071][SQL] Optimize the processing speed of `BinaryArithmetic#dataType` when processing multi-column data

2023-09-05 Thread via GitHub
ming95 commented on PR #42804: URL: https://github.com/apache/spark/pull/42804#issuecomment-1706594949 @cloud-fan @wangyum Please merge it to master , thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] wangyum commented on a diff in pull request #42804: [SPARK-45071][SQL] Optimize the processing speed of `BinaryArithmetic#dataType` when processing multi-column data

2023-09-05 Thread via GitHub
wangyum commented on code in PR #42804: URL: https://github.com/apache/spark/pull/42804#discussion_r1315905477 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala: ## @@ -234,11 +240,7 @@ abstract class BinaryArithmetic extends BinaryOperato

[GitHub] [spark] zzzzming95 commented on a diff in pull request #42574: [SPARK-43149][SQL] `CreateDataSourceTableCommand` should create metadata first

2023-09-05 Thread via GitHub
ming95 commented on code in PR #42574: URL: https://github.com/apache/spark/pull/42574#discussion_r1315907482 ## sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala: ## @@ -191,16 +193,26 @@ case class CreateDataSourceTableAsSelectComm

[GitHub] [spark] hvanhovell closed pull request #42807: [SPARK-45072][CONNECT] Fix outer scopes for ammonite classes

2023-09-05 Thread via GitHub
hvanhovell closed pull request #42807: [SPARK-45072][CONNECT] Fix outer scopes for ammonite classes URL: https://github.com/apache/spark/pull/42807 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[GitHub] [spark] juliuszsompolski commented on pull request #42806: [SPARK-44833][CONNECT] Fix sending Reattach too fast after Execute

2023-09-05 Thread via GitHub
juliuszsompolski commented on PR #42806: URL: https://github.com/apache/spark/pull/42806#issuecomment-1706640901 https://github.com/juliuszsompolski/apache-spark/actions/runs/6076122424/job/16483638602 This module timed out. All connect related tests finished successfuly. -- This is an

[GitHub] [spark] cloud-fan commented on pull request #41683: [SPARK-36680][SQL] Supports Dynamic Table Options for Spark SQL

2023-09-05 Thread via GitHub
cloud-fan commented on PR #41683: URL: https://github.com/apache/spark/pull/41683#issuecomment-1706680230 Let's spend more time on the API design first, as different people may have different opinions and we should collect as much feedback as possible. Taking a step back, I think what

[GitHub] [spark] cloud-fan commented on a diff in pull request #42810: [SPARK-45075][SQL] Fix alter table with invalid default value will not report error

2023-09-05 Thread via GitHub
cloud-fan commented on code in PR #42810: URL: https://github.com/apache/spark/pull/42810#discussion_r1315957135 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala: ## @@ -230,6 +230,15 @@ case class AlterColumn( val defaul

[GitHub] [spark] cloud-fan commented on a diff in pull request #42810: [SPARK-45075][SQL] Fix alter table with invalid default value will not report error

2023-09-05 Thread via GitHub
cloud-fan commented on code in PR #42810: URL: https://github.com/apache/spark/pull/42810#discussion_r1315957135 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala: ## @@ -230,6 +230,15 @@ case class AlterColumn( val defaul

[GitHub] [spark] Hisoka-X commented on a diff in pull request #42810: [SPARK-45075][SQL] Fix alter table with invalid default value will not report error

2023-09-05 Thread via GitHub
Hisoka-X commented on code in PR #42810: URL: https://github.com/apache/spark/pull/42810#discussion_r1315970129 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala: ## @@ -230,6 +230,15 @@ case class AlterColumn( val default

[GitHub] [spark] cloud-fan commented on a diff in pull request #42481: [SPARK-44801][SQL][UI] Capture analyzing failed queries in Listener and UI

2023-09-05 Thread via GitHub
cloud-fan commented on code in PR #42481: URL: https://github.com/apache/spark/pull/42481#discussion_r1315972356 ## sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala: ## @@ -124,7 +136,7 @@ object SQLExecution { physicalPlanDescription = que

[GitHub] [spark] MaxGekk closed pull request #42816: [WIP][SPARK-45022][SQL] Provide context for dataset API errors

2023-09-05 Thread via GitHub
MaxGekk closed pull request #42816: [WIP][SPARK-45022][SQL] Provide context for dataset API errors URL: https://github.com/apache/spark/pull/42816 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] cloud-fan commented on a diff in pull request #42810: [SPARK-45075][SQL] Fix alter table with invalid default value will not report error

2023-09-05 Thread via GitHub
cloud-fan commented on code in PR #42810: URL: https://github.com/apache/spark/pull/42810#discussion_r1315981102 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala: ## @@ -228,6 +228,15 @@ case class AlterColumn( TableCha

[GitHub] [spark] Hisoka-X commented on a diff in pull request #42810: [SPARK-45075][SQL] Fix alter table with invalid default value will not report error

2023-09-05 Thread via GitHub
Hisoka-X commented on code in PR #42810: URL: https://github.com/apache/spark/pull/42810#discussion_r1316039979 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala: ## @@ -228,6 +228,15 @@ case class AlterColumn( TableChan

[GitHub] [spark] dongjoon-hyun commented on pull request #42807: [SPARK-45072][CONNECT] Fix outer scopes for ammonite classes

2023-09-05 Thread via GitHub
dongjoon-hyun commented on PR #42807: URL: https://github.com/apache/spark/pull/42807#issuecomment-1706851416 Let me fix that for you, @hvanhovell . If you don't think this is not a bug, please let me know, @hvanhovell . -- This is an automated message from the Apache Git Service. To resp

[GitHub] [spark] zzzzming95 commented on a diff in pull request #42804: [SPARK-45071][SQL] Optimize the processing speed of `BinaryArithmetic#dataType` when processing multi-column data

2023-09-05 Thread via GitHub
ming95 commented on code in PR #42804: URL: https://github.com/apache/spark/pull/42804#discussion_r1316079522 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala: ## @@ -234,11 +240,7 @@ abstract class BinaryArithmetic extends BinaryOper

[GitHub] [spark] hvanhovell commented on pull request #42807: [SPARK-45072][CONNECT] Fix outer scopes for ammonite classes

2023-09-05 Thread via GitHub
hvanhovell commented on PR #42807: URL: https://github.com/apache/spark/pull/42807#issuecomment-1706856355 I think it is a bug. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[GitHub] [spark] dtenedor commented on a diff in pull request #42810: [SPARK-45075][SQL] Fix alter table with invalid default value will not report error

2023-09-05 Thread via GitHub
dtenedor commented on code in PR #42810: URL: https://github.com/apache/spark/pull/42810#discussion_r1316137847 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala: ## @@ -228,6 +228,15 @@ case class AlterColumn( TableChan

[GitHub] [spark] sunchao commented on pull request #42757: [SPARK-45036][SQL] SPJ: Simplify the logic to handle partially clustered distribution

2023-09-05 Thread via GitHub
sunchao commented on PR #42757: URL: https://github.com/apache/spark/pull/42757#issuecomment-1706951663 Thanks all! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

[GitHub] [spark] juliuszsompolski opened a new pull request, #42818: [SPARK-44835] Make INVALID_CURSOR.DISCONNECTED a retriable error

2023-09-05 Thread via GitHub
juliuszsompolski opened a new pull request, #42818: URL: https://github.com/apache/spark/pull/42818 ### What changes were proposed in this pull request? Make INVALID_CURSOR.DISCONNECTED a retriable error. ### Why are the changes needed? This error can happen if two RPCs a

[GitHub] [spark] xuanyuanking opened a new pull request, #42819: [SPARK-45082][DOC] Review and fix issues in API docs for 3.5.0

2023-09-05 Thread via GitHub
xuanyuanking opened a new pull request, #42819: URL: https://github.com/apache/spark/pull/42819 ### What changes were proposed in this pull request? Compare the 3.4 API doc with the 3.5 RC3 cut. Fix the following issues: - Remove the leaking class/object in API doc ### Wh

[GitHub] [spark] srielau commented on pull request #41683: [SPARK-36680][SQL] Supports Dynamic Table Options for Spark SQL

2023-09-05 Thread via GitHub
srielau commented on PR #41683: URL: https://github.com/apache/spark/pull/41683#issuecomment-1706980076 +1 on using a WITH clause. For UPDATE: > WITH (OPTIONS ( Why the nesting? -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] juliuszsompolski commented on pull request #42772: [SPARK-45051][CONNECT] Use UUIDv7 by default for operation IDs to make operations chronologically sortable

2023-09-05 Thread via GitHub
juliuszsompolski commented on PR #42772: URL: https://github.com/apache/spark/pull/42772#issuecomment-170760 We maintain backwards compatibility, where older clients can connect to newer server. These older clients will not provide such UUIDs. What will happen then? Does it break any

[GitHub] [spark] agubichev commented on a diff in pull request #42778: [SPARK-45055] [SQL] Do not transpose windows if they conflict on ORDER BY / PROJECT clauses

2023-09-05 Thread via GitHub
agubichev commented on code in PR #42778: URL: https://github.com/apache/spark/pull/42778#discussion_r1316176643 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/TransposeWindowSuite.scala: ## @@ -160,4 +160,18 @@ class TransposeWindowSuite extends PlanTest

[GitHub] [spark] juliuszsompolski commented on pull request #42772: [SPARK-45051][CONNECT] Use UUIDv7 by default for operation IDs to make operations chronologically sortable

2023-09-05 Thread via GitHub
juliuszsompolski commented on PR #42772: URL: https://github.com/apache/spark/pull/42772#issuecomment-1707008850 ... although, the currently existing (Spark 3.4) clients never generate operationId client side, so we can get away with adding an assertion that the client side id is UUID7 in `

[GitHub] [spark] srowen commented on pull request #42815: [SPARK-45077][UI] Upgrade dagre-d3.js from 0.4.3 to 0.6.4

2023-09-05 Thread via GitHub
srowen commented on PR #42815: URL: https://github.com/apache/spark/pull/42815#issuecomment-1707031614 Are you saying 0.6.4 doesn't work well? is this just for your testing then and not to merge? -- This is an automated message from the Apache Git Service. To respond to the message, pleas

[GitHub] [spark] srowen commented on pull request #42819: [SPARK-45082][DOC] Review and fix issues in API docs for 3.5.0

2023-09-05 Thread via GitHub
srowen commented on PR #42819: URL: https://github.com/apache/spark/pull/42819#issuecomment-1707033259 Seems OK in principle, just need tests to pass -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] dillitz commented on pull request #42772: [SPARK-45051][CONNECT] Use UUIDv7 by default for operation IDs to make operations chronologically sortable

2023-09-05 Thread via GitHub
dillitz commented on PR #42772: URL: https://github.com/apache/spark/pull/42772#issuecomment-1707033407 We agreed that the benefits of adding this are not big enough because we can not rely on the operation ID being UUIDv7 and need to sort by startDate anyway. Closing this PR. -- This is

[GitHub] [spark] dillitz closed pull request #42772: [SPARK-45051][CONNECT] Use UUIDv7 by default for operation IDs to make operations chronologically sortable

2023-09-05 Thread via GitHub
dillitz closed pull request #42772: [SPARK-45051][CONNECT] Use UUIDv7 by default for operation IDs to make operations chronologically sortable URL: https://github.com/apache/spark/pull/42772 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] mridulm commented on a diff in pull request #42529: [SPARK-44845][YARN][DEPLOY] Fix file system uri comparison function

2023-09-05 Thread via GitHub
mridulm commented on code in PR #42529: URL: https://github.com/apache/spark/pull/42529#discussion_r1316224064 ## resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala: ## @@ -1618,9 +1618,9 @@ private[spark] object Client extends Logging { retur

[GitHub] [spark] xuanyuanking commented on pull request #42819: [SPARK-45082][DOC] Review and fix issues in API docs for 3.5.0

2023-09-05 Thread via GitHub
xuanyuanking commented on PR #42819: URL: https://github.com/apache/spark/pull/42819#issuecomment-1707142988 The test `On pull request update / Notify test workflow (pull_request_target)` that failed has passed for the initial commit. I think it's good to go. @srowen, could you give your ap

[GitHub] [spark] srowen commented on pull request #42819: [SPARK-45082][DOC] Review and fix issues in API docs for 3.5.0

2023-09-05 Thread via GitHub
srowen commented on PR #42819: URL: https://github.com/apache/spark/pull/42819#issuecomment-1707147907 I'm concerned that this may not pass mima tests and I don't see that this test was run because of other errors. Have you checked that it passes in both branches? -- This is an automated

[GitHub] [spark] xuanyuanking commented on pull request #42819: [SPARK-45082][DOC] Review and fix issues in API docs for 3.5.0

2023-09-05 Thread via GitHub
xuanyuanking commented on PR #42819: URL: https://github.com/apache/spark/pull/42819#issuecomment-1707160026 good point, let me run Mima test manually on both master and 3.5 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] xuanyuanking commented on pull request #42819: [SPARK-45082][DOC] Review and fix issues in API docs for 3.5.0

2023-09-05 Thread via GitHub
xuanyuanking commented on PR #42819: URL: https://github.com/apache/spark/pull/42819#issuecomment-1707175717 @srowen Checked manually for both master and branch-3.5, the Mima test passed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] WweiL commented on pull request #42664: [SPARK-44435][SPARK-44484][3.5][SS][CONNECT] Tests for foreachBatch and Listener

2023-09-05 Thread via GitHub
WweiL commented on PR #42664: URL: https://github.com/apache/spark/pull/42664#issuecomment-1707179668 fixed in https://github.com/apache/spark/commit/7be69bf7da036282c2c7c0b62c32e7666fa1b579 -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] WweiL closed pull request #42664: [SPARK-44435][SPARK-44484][3.5][SS][CONNECT] Tests for foreachBatch and Listener

2023-09-05 Thread via GitHub
WweiL closed pull request #42664: [SPARK-44435][SPARK-44484][3.5][SS][CONNECT] Tests for foreachBatch and Listener URL: https://github.com/apache/spark/pull/42664 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

[GitHub] [spark] pan3793 opened a new pull request, #42820: [WIP][TEST] Remove obsolete repo of DB2 JDBC driver

2023-09-05 Thread via GitHub
pan3793 opened a new pull request, #42820: URL: https://github.com/apache/spark/pull/42820 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was

[GitHub] [spark] xuanyuanking closed pull request #42819: [SPARK-45082][DOC] Review and fix issues in API docs for 3.5.0

2023-09-05 Thread via GitHub
xuanyuanking closed pull request #42819: [SPARK-45082][DOC] Review and fix issues in API docs for 3.5.0 URL: https://github.com/apache/spark/pull/42819 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[GitHub] [spark] xuanyuanking commented on pull request #42819: [SPARK-45082][DOC] Review and fix issues in API docs for 3.5.0

2023-09-05 Thread via GitHub
xuanyuanking commented on PR #42819: URL: https://github.com/apache/spark/pull/42819#issuecomment-1707217892 Thanks, merged in master and branch-3.5 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[GitHub] [spark] planga82 commented on a diff in pull request #42759: [SPARK-45039][SQL] Include full identifier in Storage tab

2023-09-05 Thread via GitHub
planga82 commented on code in PR #42759: URL: https://github.com/apache/spark/pull/42759#discussion_r1316342993 ## sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala: ## @@ -117,12 +117,20 @@ class CacheManager extends Logging with AdaptiveSparkPlanHelper

[GitHub] [spark] andylam-db commented on a diff in pull request #42725: [SPARK-45009][SQL] Decorrelate predicate subqueries in join condition

2023-09-05 Thread via GitHub
andylam-db commented on code in PR #42725: URL: https://github.com/apache/spark/pull/42725#discussion_r1316384305 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala: ## @@ -3751,4 +3751,14 @@ private[sql] object QueryCompilationErrors extends

[GitHub] [spark] dtenedor commented on a diff in pull request #42810: [SPARK-45075][SQL] Fix alter table with invalid default value will not report error

2023-09-05 Thread via GitHub
dtenedor commented on code in PR #42810: URL: https://github.com/apache/spark/pull/42810#discussion_r1316435058 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala: ## @@ -228,6 +228,15 @@ case class AlterColumn( TableChan

[GitHub] [spark] allisonwang-db opened a new pull request, #42821: [SPARK-45083][PYTHON][DOCS] Refine the docstring of function `min`

2023-09-05 Thread via GitHub
allisonwang-db opened a new pull request, #42821: URL: https://github.com/apache/spark/pull/42821 ### What changes were proposed in this pull request? This PR refines the function `min` docstring by adding more examples. ### Why are the changes needed? To improve

[GitHub] [spark] itholic commented on a diff in pull request #42798: [SPARK-43295][PS] Support string type columns for `DataFrameGroupBy.sum`

2023-09-05 Thread via GitHub
itholic commented on code in PR #42798: URL: https://github.com/apache/spark/pull/42798#discussion_r1316539576 ## python/pyspark/pandas/groupby.py: ## @@ -3534,7 +3534,12 @@ def _reduce_for_stat_function( for label in psdf._internal.column_labels: p

[GitHub] [spark] itholic commented on a diff in pull request #42798: [SPARK-43295][PS] Support string type columns for `DataFrameGroupBy.sum`

2023-09-05 Thread via GitHub
itholic commented on code in PR #42798: URL: https://github.com/apache/spark/pull/42798#discussion_r1316539867 ## python/pyspark/pandas/groupby.py: ## @@ -3534,7 +3534,12 @@ def _reduce_for_stat_function( for label in psdf._internal.column_labels: p

[GitHub] [spark] zhengruifeng commented on pull request #42815: [SPARK-45077][UI] Upgrade dagre-d3.js from 0.4.3 to 0.6.4

2023-09-05 Thread via GitHub
zhengruifeng commented on PR #42815: URL: https://github.com/apache/spark/pull/42815#issuecomment-1707476622 cc @gengliangwang @jasonli-db -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [spark] Hisoka-X commented on a diff in pull request #42810: [SPARK-45075][SQL] Fix alter table with invalid default value will not report error

2023-09-05 Thread via GitHub
Hisoka-X commented on code in PR #42810: URL: https://github.com/apache/spark/pull/42810#discussion_r1316557650 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala: ## @@ -228,6 +228,15 @@ case class AlterColumn( TableChan

[GitHub] [spark] Hisoka-X commented on a diff in pull request #42810: [SPARK-45075][SQL] Fix alter table with invalid default value will not report error

2023-09-05 Thread via GitHub
Hisoka-X commented on code in PR #42810: URL: https://github.com/apache/spark/pull/42810#discussion_r1316557800 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala: ## @@ -228,6 +228,15 @@ case class AlterColumn( TableChan

[GitHub] [spark] rangadi commented on a diff in pull request #42779: [SPARK-45056][PYTHON][SS][CONNECT] Termination tests for streamingQueryListener and foreachBatch

2023-09-05 Thread via GitHub
rangadi commented on code in PR #42779: URL: https://github.com/apache/spark/pull/42779#discussion_r1316564483 ## python/pyspark/sql/tests/connect/streaming/worker_for_testing.py: ## @@ -0,0 +1,63 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# con

[GitHub] [spark] zhengruifeng commented on a diff in pull request #42798: [SPARK-43295][PS] Support string type columns for `DataFrameGroupBy.sum`

2023-09-05 Thread via GitHub
zhengruifeng commented on code in PR #42798: URL: https://github.com/apache/spark/pull/42798#discussion_r1316572356 ## python/pyspark/pandas/groupby.py: ## @@ -3534,7 +3534,12 @@ def _reduce_for_stat_function( for label in psdf._internal.column_labels:

[GitHub] [spark] zhengruifeng commented on a diff in pull request #42798: [SPARK-43295][PS] Support string type columns for `DataFrameGroupBy.sum`

2023-09-05 Thread via GitHub
zhengruifeng commented on code in PR #42798: URL: https://github.com/apache/spark/pull/42798#discussion_r1316574489 ## python/pyspark/pandas/groupby.py: ## @@ -3537,14 +3537,14 @@ def _reduce_for_stat_function( if sfun.__name__ == "sum" and isinstance(

[GitHub] [spark] panbingkun commented on a diff in pull request #42109: [SPARK-44404][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[1009,1010,1013,1015,1016,1278]

2023-09-05 Thread via GitHub
panbingkun commented on code in PR #42109: URL: https://github.com/apache/spark/pull/42109#discussion_r1315780224 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -1134,20 +1134,21 @@ class Analyzer(override val catalogManager: Catalog

[GitHub] [spark] zhengruifeng closed pull request #42821: [SPARK-45083][PYTHON][DOCS] Refine the docstring of function `min`

2023-09-05 Thread via GitHub
zhengruifeng closed pull request #42821: [SPARK-45083][PYTHON][DOCS] Refine the docstring of function `min` URL: https://github.com/apache/spark/pull/42821 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] zhengruifeng commented on pull request #42821: [SPARK-45083][PYTHON][DOCS] Refine the docstring of function `min`

2023-09-05 Thread via GitHub
zhengruifeng commented on PR #42821: URL: https://github.com/apache/spark/pull/42821#issuecomment-1707515898 merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] itholic commented on a diff in pull request #42798: [SPARK-43295][PS] Support string type columns for `DataFrameGroupBy.sum`

2023-09-05 Thread via GitHub
itholic commented on code in PR #42798: URL: https://github.com/apache/spark/pull/42798#discussion_r1316586840 ## python/pyspark/pandas/groupby.py: ## @@ -3537,14 +3537,14 @@ def _reduce_for_stat_function( if sfun.__name__ == "sum" and isinstance(

[GitHub] [spark] yaooqinn commented on pull request #42815: [SPARK-45077][UI] Upgrade dagre-d3.js from 0.4.3 to 0.6.4

2023-09-05 Thread via GitHub
yaooqinn commented on PR #42815: URL: https://github.com/apache/spark/pull/42815#issuecomment-1707530280 also cc @sarutak @HyukjinKwon and @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[GitHub] [spark] yaooqinn commented on pull request #42815: [SPARK-45077][UI] Upgrade dagre-d3.js from 0.4.3 to 0.6.4

2023-09-05 Thread via GitHub
yaooqinn commented on PR #42815: URL: https://github.com/apache/spark/pull/42815#issuecomment-1707529974 > Are you saying 0.6.4 doesn't work well? is this just for your testing then and not to merge? Hi @srowen, I apologize for any confusion caused. This pull request is for merging p

[GitHub] [spark] HeartSaVioR opened a new pull request, #42823: [SPARK-45080][SS] Explicitly call out support for columnar in DSv2 streaming data sources

2023-09-05 Thread via GitHub
HeartSaVioR opened a new pull request, #42823: URL: https://github.com/apache/spark/pull/42823 ### What changes were proposed in this pull request? This PR proposes to override `Scan.columnarSupportMode` for DSv2 streaming data sources. All of them don't support columnar. Ratio

[GitHub] [spark] yliou commented on a diff in pull request #40502: [SPARK-42829] [UI] add repeat identifier to cached RDD on stage page

2023-09-05 Thread via GitHub
yliou commented on code in PR #40502: URL: https://github.com/apache/spark/pull/40502#discussion_r1316647094 ## core/src/main/scala/org/apache/spark/ui/scope/RDDOperationGraph.scala: ## @@ -221,6 +226,24 @@ private[spark] object RDDOperationGraph extends Logging { RDDOperat

[GitHub] [spark] yaooqinn commented on a diff in pull request #42481: [SPARK-44801][SQL][UI] Capture analyzing failed queries in Listener and UI

2023-09-05 Thread via GitHub
yaooqinn commented on code in PR #42481: URL: https://github.com/apache/spark/pull/42481#discussion_r1316655772 ## sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala: ## @@ -124,7 +136,7 @@ object SQLExecution { physicalPlanDescription = quer

[GitHub] [spark] panbingkun opened a new pull request, #42824: [SPARK-45085][SQL] Merge UNSUPPORTED_TEMP_VIEW_OPERATION into UNSUPPORTED_VIEW_OPERATION and refactor some logic

2023-09-05 Thread via GitHub
panbingkun opened a new pull request, #42824: URL: https://github.com/apache/spark/pull/42824 ### What changes were proposed in this pull request? The pr aims to: - Merge UNSUPPORTED_TEMP_VIEW_OPERATION into UNSUPPORTED_VIEW_OPERATION in error classes. - Refactor some code. - Fix

[GitHub] [spark] HeartSaVioR commented on pull request #42823: [SPARK-45080][SS] Explicitly call out support for columnar in DSv2 streaming data sources

2023-09-05 Thread via GitHub
HeartSaVioR commented on PR #42823: URL: https://github.com/apache/spark/pull/42823#issuecomment-1707580885 cc. @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] yaooqinn opened a new pull request, #42825: [SPARK-44801][SQL][FOLLOWUP] Remove overdue comments for generating plan info in SQLExecution

2023-09-05 Thread via GitHub
yaooqinn opened a new pull request, #42825: URL: https://github.com/apache/spark/pull/42825 ### What changes were proposed in this pull request? This is a followup to clean up comment for generating plan info in SQLExecution, which is currently wrapped with try-cat

[GitHub] [spark] yaooqinn commented on pull request #42825: [SPARK-44801][SQL][FOLLOWUP] Remove overdue comments for generating plan info in SQLExecution

2023-09-05 Thread via GitHub
yaooqinn commented on PR #42825: URL: https://github.com/apache/spark/pull/42825#issuecomment-1707584651 cc @cloud-fan and thanks for the post review of https://github.com/apache/spark/pull/42481/files#r1315972356 -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] yaooqinn closed pull request #42825: [SPARK-44801][SQL][FOLLOWUP] Remove overdue comments for generating plan info in SQLExecution

2023-09-05 Thread via GitHub
yaooqinn closed pull request #42825: [SPARK-44801][SQL][FOLLOWUP] Remove overdue comments for generating plan info in SQLExecution URL: https://github.com/apache/spark/pull/42825 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[GitHub] [spark] HeartSaVioR commented on pull request #42823: [SPARK-45080][SS] Explicitly call out support for columnar in DSv2 streaming data sources

2023-09-05 Thread via GitHub
HeartSaVioR commented on PR #42823: URL: https://github.com/apache/spark/pull/42823#issuecomment-1707590753 cc. @zsxwing @viirya @xuanyuanking as well -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] cloud-fan commented on a diff in pull request #42804: [SPARK-45071][SQL] Optimize the processing speed of `BinaryArithmetic#dataType` when processing multi-column data

2023-09-05 Thread via GitHub
cloud-fan commented on code in PR #42804: URL: https://github.com/apache/spark/pull/42804#discussion_r1316665134 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala: ## @@ -234,11 +240,7 @@ abstract class BinaryArithmetic extends BinaryOpera

[GitHub] [spark] cloud-fan commented on a diff in pull request #42810: [SPARK-45075][SQL] Fix alter table with invalid default value will not report error

2023-09-05 Thread via GitHub
cloud-fan commented on code in PR #42810: URL: https://github.com/apache/spark/pull/42810#discussion_r1316665187 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala: ## @@ -228,6 +228,15 @@ case class AlterColumn( TableCha

[GitHub] [spark] Hisoka-X commented on a diff in pull request #42810: [SPARK-45075][SQL] Fix alter table with invalid default value will not report error

2023-09-05 Thread via GitHub
Hisoka-X commented on code in PR #42810: URL: https://github.com/apache/spark/pull/42810#discussion_r1316669968 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala: ## @@ -228,6 +228,15 @@ case class AlterColumn( TableChan

[GitHub] [spark] Hisoka-X commented on a diff in pull request #42810: [SPARK-45075][SQL] Fix alter table with invalid default value will not report error

2023-09-05 Thread via GitHub
Hisoka-X commented on code in PR #42810: URL: https://github.com/apache/spark/pull/42810#discussion_r1316669968 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala: ## @@ -228,6 +228,15 @@ case class AlterColumn( TableChan

[GitHub] [spark] Hisoka-X commented on a diff in pull request #42810: [SPARK-45075][SQL] Fix alter table with invalid default value will not report error

2023-09-05 Thread via GitHub
Hisoka-X commented on code in PR #42810: URL: https://github.com/apache/spark/pull/42810#discussion_r1316673105 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala: ## @@ -228,6 +228,15 @@ case class AlterColumn( TableChan

[GitHub] [spark] Hisoka-X commented on a diff in pull request #42810: [SPARK-45075][SQL] Fix alter table with invalid default value will not report error

2023-09-05 Thread via GitHub
Hisoka-X commented on code in PR #42810: URL: https://github.com/apache/spark/pull/42810#discussion_r1316669968 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala: ## @@ -228,6 +228,15 @@ case class AlterColumn( TableChan

[GitHub] [spark] wangyum closed pull request #42804: [SPARK-45071][SQL] Optimize the processing speed of `BinaryArithmetic#dataType` when processing multi-column data

2023-09-05 Thread via GitHub
wangyum closed pull request #42804: [SPARK-45071][SQL] Optimize the processing speed of `BinaryArithmetic#dataType` when processing multi-column data URL: https://github.com/apache/spark/pull/42804 -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] wangyum commented on pull request #42804: [SPARK-45071][SQL] Optimize the processing speed of `BinaryArithmetic#dataType` when processing multi-column data

2023-09-05 Thread via GitHub
wangyum commented on PR #42804: URL: https://github.com/apache/spark/pull/42804#issuecomment-1707606993 Merged to master, branch-3.5 and branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[GitHub] [spark] LuciferYang closed pull request #42766: [SPARK-45046][BUILD] Set `shadeTestJar` of `core` module to `false`

2023-09-05 Thread via GitHub
LuciferYang closed pull request #42766: [SPARK-45046][BUILD] Set `shadeTestJar` of `core` module to `false` URL: https://github.com/apache/spark/pull/42766 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] LuciferYang commented on pull request #42766: [SPARK-45046][BUILD] Set `shadeTestJar` of `core` module to `false`

2023-09-05 Thread via GitHub
LuciferYang commented on PR #42766: URL: https://github.com/apache/spark/pull/42766#issuecomment-1707607819 Merged into master. Thanks @gengliangwang @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[GitHub] [spark] LuciferYang commented on pull request #42766: [SPARK-45046][BUILD] Set `shadeTestJar` of `core` module to `false`

2023-09-05 Thread via GitHub
LuciferYang commented on PR #42766: URL: https://github.com/apache/spark/pull/42766#issuecomment-1707613904 @gengliangwang After further investigation, there are three modules that depend on `spark-streaming_2.12:test-jar`: `mllib`, `streaming-kafka-0-10` and `kinesis-asl`. As I lack the ne

[GitHub] [spark] yaooqinn opened a new pull request, #42826: [SPARK-45086][UI] Display hexadecimal for thread lock hash code

2023-09-05 Thread via GitHub
yaooqinn opened a new pull request, #42826: URL: https://github.com/apache/spark/pull/42826 ### What changes were proposed in this pull request? This PR fixes the stringify method for MonitorInfo/LockInfo to use `toString` which contains an extra step of Integer.toHexStrin

[GitHub] [spark] LuciferYang opened a new pull request, #42827: [BUILD] Test build with maven 3.9.4

2023-09-05 Thread via GitHub
LuciferYang opened a new pull request, #42827: URL: https://github.com/apache/spark/pull/42827 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] zhengruifeng opened a new pull request, #42828: [SPARK-45088][PYTHON][CONNECT] Make `getitem` work with duplicated columns

2023-09-05 Thread via GitHub
zhengruifeng opened a new pull request, #42828: URL: https://github.com/apache/spark/pull/42828 ### What changes were proposed in this pull request? - Make `getitem` work with duplicated columns - Disallow bool type index - Disallow negative index ### Why are the chang

[GitHub] [spark] HyukjinKwon commented on pull request #42806: [SPARK-44833][CONNECT] Fix sending Reattach too fast after Execute

2023-09-05 Thread via GitHub
HyukjinKwon commented on PR #42806: URL: https://github.com/apache/spark/pull/42806#issuecomment-1707679262 Merged to master and branch-3.5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [spark] HyukjinKwon closed pull request #42806: [SPARK-44833][CONNECT] Fix sending Reattach too fast after Execute

2023-09-05 Thread via GitHub
HyukjinKwon closed pull request #42806: [SPARK-44833][CONNECT] Fix sending Reattach too fast after Execute URL: https://github.com/apache/spark/pull/42806 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[GitHub] [spark] MaxGekk commented on pull request #42801: [SPARK-45070][SQL][DOCS] Describe the binary and datetime formats of `to_char`/`to_varchar`

2023-09-05 Thread via GitHub
MaxGekk commented on PR #42801: URL: https://github.com/apache/spark/pull/42801#issuecomment-1707683221 Merging to master. Thank you, @dongjoon-hyun and @cloud-fan for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[GitHub] [spark] MaxGekk closed pull request #42801: [SPARK-45070][SQL][DOCS] Describe the binary and datetime formats of `to_char`/`to_varchar`

2023-09-05 Thread via GitHub
MaxGekk closed pull request #42801: [SPARK-45070][SQL][DOCS] Describe the binary and datetime formats of `to_char`/`to_varchar` URL: https://github.com/apache/spark/pull/42801 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

  1   2   >