[PR] [SPARK-46501][INFRA] List the python packages with the correct versions [spark]

2023-12-24 Thread via GitHub
zhengruifeng opened a new pull request, #44479: URL: https://github.com/apache/spark/pull/44479 ### What changes were proposed in this pull request? List the python packages with the correct versions ### Why are the changes needed? the version here should be in `PYTHON_TO_TEST `

Re: [PR] [SPARK-46500][PS][TESTS] Reorganize `FrameParityPivotTests` [spark]

2023-12-24 Thread via GitHub
zhengruifeng commented on PR #44478: URL: https://github.com/apache/spark/pull/44478#issuecomment-1868831406 cc @HyukjinKwon @itholic -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] [SPARK-46500][PS][TESTS] Reorganize `FrameParityPivotTests` [spark]

2023-12-24 Thread via GitHub
zhengruifeng opened a new pull request, #44478: URL: https://github.com/apache/spark/pull/44478 ### What changes were proposed in this pull request? Reorganize `FrameParityPivotTests`: break `test_pivot_table` into mutiple tests ### Why are the changes needed? this test is s

Re: [PR] [SPARK-46500][PS][TESTS] Reorganize `FrameParityPivotTests` [spark]

2023-12-24 Thread via GitHub
zhengruifeng commented on PR #44478: URL: https://github.com/apache/spark/pull/44478#issuecomment-1868831321 ci: https://github.com/zhengruifeng/spark/actions/runs/7317799587/job/19933749396 -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] [SPARK-46487][SQL] Push down part of filter through aggregate with nondeterministic field [spark]

2023-12-24 Thread via GitHub
zml1206 commented on code in PR #44460: URL: https://github.com/apache/spark/pull/44460#discussion_r1435992507 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -1713,8 +1713,7 @@ object PushPredicateThroughNonJoin extends Rule[Logica

Re: [PR] [SPARK-40876][SQL] Widening type promotions in Parquet readers [spark]

2023-12-24 Thread via GitHub
LuciferYang commented on PR #44368: URL: https://github.com/apache/spark/pull/44368#issuecomment-1868820991 There are 3 test failed in `ParquetTypeWideningSuite` with `spark.sql.ansi.enabled=true`, cloud you take a look? @johanl-db - https://github.com/apache/spark/actions/runs/73180

Re: [PR] [SPARK-46487][SQL] Push down part of filter through aggregate with nondeterministic field [spark]

2023-12-24 Thread via GitHub
zml1206 commented on code in PR #44460: URL: https://github.com/apache/spark/pull/44460#discussion_r1435959826 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -1713,8 +1713,7 @@ object PushPredicateThroughNonJoin extends Rule[Logica

Re: [PR] [SPARK-46444][SQL] V2SessionCatalog#createTable should not load the table [spark]

2023-12-24 Thread via GitHub
cloud-fan commented on code in PR #44377: URL: https://github.com/apache/spark/pull/44377#discussion_r1435982927 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2SessionCatalog.scala: ## @@ -233,21 +244,7 @@ class V2SessionCatalog(catalog: SessionCatalo

Re: [PR] [MINOR][INFRA] Comments in GitHub scripts should start with # [spark]

2023-12-24 Thread via GitHub
panbingkun commented on PR #44473: URL: https://github.com/apache/spark/pull/44473#issuecomment-1868800078 > Is this the only case? Yes, I `grep` and only found this place so far. -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [PR] [MINOR][INFRA] Comments in GitHub scripts should start with # [spark]

2023-12-24 Thread via GitHub
LuciferYang commented on PR #44473: URL: https://github.com/apache/spark/pull/44473#issuecomment-1868795736 Is this the only case? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] [MINOR][INFRA] Comments in GitHub scripts should start with # [spark]

2023-12-24 Thread via GitHub
panbingkun commented on PR #44473: URL: https://github.com/apache/spark/pull/44473#issuecomment-1868793018 cc @LuciferYang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] Test Ivy 2.5.2 [spark]

2023-12-24 Thread via GitHub
LuciferYang opened a new pull request, #44477: URL: https://github.com/apache/spark/pull/44477 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[PR] [SPARK-46499][BUILD] Bump sbt-eclipse 6.2.0 [spark]

2023-12-24 Thread via GitHub
pan3793 opened a new pull request, #44476: URL: https://github.com/apache/spark/pull/44476 ### What changes were proposed in this pull request? Bump SBT plugin `sbt-eclipse` from 6.0.0 to 6.2.0 ### Why are the changes needed? Which brings the Java 21 suppo

Re: [PR] [SPARK-46487][SQL] Push down part of filter through aggregate with nondeterministic field [spark]

2023-12-24 Thread via GitHub
zml1206 commented on code in PR #44460: URL: https://github.com/apache/spark/pull/44460#discussion_r1435959826 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -1713,8 +1713,7 @@ object PushPredicateThroughNonJoin extends Rule[Logica

Re: [PR] [SPARK-46478][SQL] Revert SPARK-43049 to use oracle varchar(255) for string [spark]

2023-12-24 Thread via GitHub
yaooqinn commented on PR #2: URL: https://github.com/apache/spark/pull/2#issuecomment-1868749982 Thank you, @dongjoon-hyun. I'm OOO now. I will raise a backport PR tomorrow. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] [SPARK-46487][SQL] Push down part of filter through aggregate with nondeterministic field [spark]

2023-12-24 Thread via GitHub
beliefer commented on code in PR #44460: URL: https://github.com/apache/spark/pull/44460#discussion_r1435953022 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -1713,8 +1713,7 @@ object PushPredicateThroughNonJoin extends Rule[Logic

[PR] [SPARK-46498][CORE] Clean up unused methods and local variables in `o.a.spark.util.Utils` [spark]

2023-12-24 Thread via GitHub
LuciferYang opened a new pull request, #44475: URL: https://github.com/apache/spark/pull/44475 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[PR] [SPARK-46497][SQL][TESTS] Re-enable the test cases that were ignored in SPARK-45309 [spark]

2023-12-24 Thread via GitHub
LuciferYang opened a new pull request, #44474: URL: https://github.com/apache/spark/pull/44474 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-45893][HIVE] Support drop multiple partitions in batch for hive [spark]

2023-12-24 Thread via GitHub
pan3793 commented on PR #43766: URL: https://github.com/apache/spark/pull/43766#issuecomment-1868711153 LGTM. cc @wangyum @yaooqinn @sunchao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] [SPARK-45893][HIVE] Support drop multiple partitions in batch for hive [spark]

2023-12-24 Thread via GitHub
pan3793 commented on code in PR #43766: URL: https://github.com/apache/spark/pull/43766#discussion_r1435936724 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala: ## @@ -501,46 +501,52 @@ class HiveClientSuite(version: String, allVersions: Seq[Str

Re: [PR] [SPARK-46487][SQL] Push down part of filter through aggregate with nondeterministic field [spark]

2023-12-24 Thread via GitHub
zml1206 commented on PR #44460: URL: https://github.com/apache/spark/pull/44460#issuecomment-1868684321 Filter after project can also push down part of filters. If there is no problem with this, I will deal with filter after project in the next pr. -- This is an automated message from

[PR] test [spark]

2023-12-24 Thread via GitHub
panbingkun opened a new pull request, #44473: URL: https://github.com/apache/spark/pull/44473 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-46487][SQL] Push down part of filter through aggregate with nondeterministic field [spark]

2023-12-24 Thread via GitHub
zml1206 commented on code in PR #44460: URL: https://github.com/apache/spark/pull/44460#discussion_r1435915034 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -1713,8 +1713,7 @@ object PushPredicateThroughNonJoin extends Rule[Logica

Re: [PR] [SPARK-46487][SQL] Push down part of filter through aggregate with nondeterministic field [spark]

2023-12-24 Thread via GitHub
zml1206 commented on code in PR #44460: URL: https://github.com/apache/spark/pull/44460#discussion_r1435915034 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -1713,8 +1713,7 @@ object PushPredicateThroughNonJoin extends Rule[Logica

Re: [PR] [SPARK-46487][SQL] Push down part of filter through aggregate with nondeterministic field [spark]

2023-12-24 Thread via GitHub
beliefer commented on code in PR #44460: URL: https://github.com/apache/spark/pull/44460#discussion_r1435912503 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -1713,8 +1713,7 @@ object PushPredicateThroughNonJoin extends Rule[Logic

Re: [PR] [SPARK-46492][BUILD] Simplify the Java version check in `SparkBuild.scala` [spark]

2023-12-24 Thread via GitHub
LuciferYang commented on PR #44465: URL: https://github.com/apache/spark/pull/44465#issuecomment-1868662204 Thanks @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] [SPARK-46493][INFRA] Upgrade `apache-rat` in the `dev/check-license` to 0.15 [spark]

2023-12-24 Thread via GitHub
LuciferYang commented on PR #44466: URL: https://github.com/apache/spark/pull/44466#issuecomment-1868662115 Thanks @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] [SPARK-46455][CORE][SQL][SS][CONNECT][PYTHON] Remove redundant type conversion [spark]

2023-12-24 Thread via GitHub
LuciferYang commented on PR #44412: URL: https://github.com/apache/spark/pull/44412#issuecomment-1868661922 Thanks @dongjoon-hyun and @ShreyeshArangath -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-46487][SQL] Push down part of filter through aggregate with nondeterministic field [spark]

2023-12-24 Thread via GitHub
beliefer commented on code in PR #44460: URL: https://github.com/apache/spark/pull/44460#discussion_r1435908665 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FilterPushdownSuite.scala: ## @@ -191,15 +191,21 @@ class FilterPushdownSuite extends PlanTest {

Re: [PR] [SPARK-46487][SQL] Push down part of filter through aggregate with nondeterministic field [spark]

2023-12-24 Thread via GitHub
beliefer commented on code in PR #44460: URL: https://github.com/apache/spark/pull/44460#discussion_r1435908665 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FilterPushdownSuite.scala: ## @@ -191,15 +191,21 @@ class FilterPushdownSuite extends PlanTest {

Re: [PR] [SPARK-46494][SQL] Remove the parse rule of `First`, `Last` and `any_value`. [spark]

2023-12-24 Thread via GitHub
beliefer commented on PR #44467: URL: https://github.com/apache/spark/pull/44467#issuecomment-1868649086 > Even with `IGNORE NULLS`? I missing the syntax. Let's close this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] [SPARK-46494][SQL] Remove the parse rule of `First`, `Last` and `any_value`. [spark]

2023-12-24 Thread via GitHub
beliefer closed pull request #44467: [SPARK-46494][SQL] Remove the parse rule of `First`, `Last` and `any_value`. URL: https://github.com/apache/spark/pull/44467 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] [SPARK-46475][BUILD] Upgrade RoaringBitmap to 1.0.1 [spark]

2023-12-24 Thread via GitHub
panbingkun commented on PR #44439: URL: https://github.com/apache/spark/pull/44439#issuecomment-1868648795 > Is this still `Draft`, @panbingkun ? Let me add the results of `org.apache.spark.MapStatusesConvertBenchmark`. JDK 17: JDK 21: -- This is an automated message from the

Re: [PR] [SPARK-46496][BUILD] Upgrade Arrow to 14.0.2 [spark]

2023-12-24 Thread via GitHub
dongjoon-hyun commented on PR #44472: URL: https://github.com/apache/spark/pull/44472#issuecomment-1868648392 Merged to master for Apache Spark 4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-46496][BUILD] Upgrade Arrow to 14.0.2 [spark]

2023-12-24 Thread via GitHub
dongjoon-hyun closed pull request #44472: [SPARK-46496][BUILD] Upgrade Arrow to 14.0.2 URL: https://github.com/apache/spark/pull/44472 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] [SPARK-46371][BUILD] Clean up outdated items in `.rat-excludes` [spark]

2023-12-24 Thread via GitHub
panbingkun commented on PR #44293: URL: https://github.com/apache/spark/pull/44293#issuecomment-1868643987 > I agree with @yaooqinn . > > To @panbingkun , could you elaborate a little more about this? I don't understand what you meant by this. > > > To avoid misunderstandings c

Re: [PR] [SPARK-46487][SQL] Push down part of filter through aggregate with nondeterministic field [spark]

2023-12-24 Thread via GitHub
zml1206 closed pull request #44460: [SPARK-46487][SQL] Push down part of filter through aggregate with nondeterministic field URL: https://github.com/apache/spark/pull/44460 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-45102] Support keyword columns on filters that interact with HMS [spark]

2023-12-24 Thread via GitHub
github-actions[bot] closed pull request #42868: [SPARK-45102] Support keyword columns on filters that interact with HMS URL: https://github.com/apache/spark/pull/42868 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] [SPARK-43752][SQL] Support default column value on DataSource V2 [spark]

2023-12-24 Thread via GitHub
github-actions[bot] commented on PR #42802: URL: https://github.com/apache/spark/pull/42802#issuecomment-1868621620 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-45129] Add pyspark "ml-connect" extras dependencies [spark]

2023-12-24 Thread via GitHub
github-actions[bot] commented on PR #42886: URL: https://github.com/apache/spark/pull/42886#issuecomment-1868621609 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] Update KryoIteratorBenchmark.scala [spark]

2023-12-24 Thread via GitHub
github-actions[bot] closed pull request #42922: Update KryoIteratorBenchmark.scala URL: https://github.com/apache/spark/pull/42922 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] [SPARK-46496][BUILD] Upgrade Arrow to 14.0.2 [spark]

2023-12-24 Thread via GitHub
dongjoon-hyun commented on PR #44472: URL: https://github.com/apache/spark/pull/44472#issuecomment-1868613565 Thank you, @zhengruifeng . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] [SPARK-46495][INFRA] Merge pyspark-error to pyspark-core [spark]

2023-12-24 Thread via GitHub
zhengruifeng commented on PR #44470: URL: https://github.com/apache/spark/pull/44470#issuecomment-1868610207 thanks @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

[PR] [SPARK-46496][BUILD] Upgrade Arrow to 14.0.2 [spark]

2023-12-24 Thread via GitHub
dongjoon-hyun opened a new pull request, #44472: URL: https://github.com/apache/spark/pull/44472 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### H

Re: [PR] [SPARK-46455][CORE][SQL][SS][CONNECT][PYTHON] Remove redundant type conversion [spark]

2023-12-24 Thread via GitHub
dongjoon-hyun closed pull request #44412: [SPARK-46455][CORE][SQL][SS][CONNECT][PYTHON] Remove redundant type conversion URL: https://github.com/apache/spark/pull/44412 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-46475][BUILD] Upgrade RoaringBitmap to 1.0.1 [spark]

2023-12-24 Thread via GitHub
dongjoon-hyun commented on PR #44439: URL: https://github.com/apache/spark/pull/44439#issuecomment-1868605821 Is this still `Draft`, @panbingkun ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-46478][SQL] Revert SPARK-43049 to use oracle varchar(255) for string [spark]

2023-12-24 Thread via GitHub
dongjoon-hyun commented on PR #2: URL: https://github.com/apache/spark/pull/2#issuecomment-1868605180 Merged to master. Could you make a backport PR for branch-3.5, @yaooqinn ? -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] [SPARK-46478][SQL] Revert SPARK-43049 to use oracle varchar(255) for string [spark]

2023-12-24 Thread via GitHub
dongjoon-hyun closed pull request #2: [SPARK-46478][SQL] Revert SPARK-43049 to use oracle varchar(255) for string URL: https://github.com/apache/spark/pull/2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [WIP][SPARK-46484][SQL][CONNECT] Make `resolveOperators*` functions keep the plan id [spark]

2023-12-24 Thread via GitHub
dongjoon-hyun commented on PR #44462: URL: https://github.com/apache/spark/pull/44462#issuecomment-1868604893 I converted it to `Draft` because of `[WIP]` tag, @zhengruifeng . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] [SPARK-46471][PS][TESTS][FOLLOWUPS] Move `OpsOnDiffFramesEnabledTests` to `pyspark.pandas.tests.diff_frames_ops.*`` [spark]

2023-12-24 Thread via GitHub
dongjoon-hyun closed pull request #44471: [SPARK-46471][PS][TESTS][FOLLOWUPS] Move `OpsOnDiffFramesEnabledTests` to `pyspark.pandas.tests.diff_frames_ops.*`` URL: https://github.com/apache/spark/pull/44471 -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [SPARK-46495][INFRA] Merge pyspark-error to pyspark-core [spark]

2023-12-24 Thread via GitHub
dongjoon-hyun closed pull request #44470: [SPARK-46495][INFRA] Merge pyspark-error to pyspark-core URL: https://github.com/apache/spark/pull/44470 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-46471][PS][TESTS][FOLLOWUPS] Move `OpsOnDiffFramesEnabledTests` to `pyspark.pandas.tests.diff_frames_ops.*`` [spark]

2023-12-24 Thread via GitHub
zhengruifeng commented on PR #44471: URL: https://github.com/apache/spark/pull/44471#issuecomment-1868491530 ci: https://github.com/zhengruifeng/spark/actions/runs/7313871476 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[PR] [SPARK-46471][PS][TESTS][FOLLOWUPS] Move `OpsOnDiffFramesEnabledTests` to `pyspark.pandas.tests.diff_frames_ops.*`` [spark]

2023-12-24 Thread via GitHub
zhengruifeng opened a new pull request, #44471: URL: https://github.com/apache/spark/pull/44471 ### What changes were proposed in this pull request? Move `OpsOnDiffFramesEnabledTests` to `pyspark.pandas.tests.diff_frames_ops.*`` ### Why are the changes needed? test code clea

Re: [PR] [SPARK-46471][PS][TESTS][FOLLOWUPS] Reorganize `OpsOnDiffFramesEnabledTests`: Factor out more tests [spark]

2023-12-24 Thread via GitHub
zhengruifeng closed pull request #44469: [SPARK-46471][PS][TESTS][FOLLOWUPS] Reorganize `OpsOnDiffFramesEnabledTests`: Factor out more tests URL: https://github.com/apache/spark/pull/44469 -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [PR] [SPARK-46471][PS][TESTS][FOLLOWUPS] Reorganize `OpsOnDiffFramesEnabledTests`: Factor out more tests [spark]

2023-12-24 Thread via GitHub
zhengruifeng commented on PR #44469: URL: https://github.com/apache/spark/pull/44469#issuecomment-1868476013 merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.