[GitHub] [spark] goutam-git commented on a diff in pull request #37065: [SPARK-38699][SQL] Use error classes in the execution errors of dictionary encoding

2022-07-08 Thread GitBox
goutam-git commented on code in PR #37065: URL: https://github.com/apache/spark/pull/37065#discussion_r917225777 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala: ## @@ -878,9 +878,10 @@ private[sql] object QueryExecutionErrors extends

[GitHub] [spark] goutam-git commented on a diff in pull request #37065: [SPARK-38699][SQL] Use error classes in the execution errors of dictionary encoding

2022-07-08 Thread GitBox
goutam-git commented on code in PR #37065: URL: https://github.com/apache/spark/pull/37065#discussion_r915497502 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala: ## @@ -878,9 +878,10 @@ private[sql] object QueryExecutionErrors extends

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37140: [SPARK-39726][SQL] Change the default value of spark.sql.execution.topKSortFallbackThreshold to 800000

2022-07-08 Thread GitBox
HyukjinKwon commented on code in PR #37140: URL: https://github.com/apache/spark/pull/37140#discussion_r917222344 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -3029,7 +3029,10 @@ object SQLConf { " in memory, otherwise do a global

[GitHub] [spark] HyukjinKwon closed pull request #37142: [SPARK-39725][BUILD] Upgrade `jetty-http` from 9.4.46.v20220331 to 9.4.48.v20220622

2022-07-08 Thread GitBox
HyukjinKwon closed pull request #37142: [SPARK-39725][BUILD] Upgrade `jetty-http` from 9.4.46.v20220331 to 9.4.48.v20220622 URL: https://github.com/apache/spark/pull/37142 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] HyukjinKwon commented on pull request #37142: [SPARK-39725][BUILD] Upgrade `jetty-http` from 9.4.46.v20220331 to 9.4.48.v20220622

2022-07-08 Thread GitBox
HyukjinKwon commented on PR #37142: URL: https://github.com/apache/spark/pull/37142#issuecomment-1179480434 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon closed pull request #37137: [SPARK-37623][SPARK-39230][SQL][FOLLOW-UP] Make regr_slope and regr_intercept safe with ANSI mode

2022-07-08 Thread GitBox
HyukjinKwon closed pull request #37137: [SPARK-37623][SPARK-39230][SQL][FOLLOW-UP] Make regr_slope and regr_intercept safe with ANSI mode URL: https://github.com/apache/spark/pull/37137 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] HyukjinKwon commented on pull request #37137: [SPARK-37623][SPARK-39230][SQL][FOLLOW-UP] Make regr_slope and regr_intercept safe with ANSI mode

2022-07-08 Thread GitBox
HyukjinKwon commented on PR #37137: URL: https://github.com/apache/spark/pull/37137#issuecomment-1179480405 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng commented on a diff in pull request #37133: [SPARK-39720][R] Implement tableExists in SparkR for 3L namespace

2022-07-08 Thread GitBox
zhengruifeng commented on code in PR #37133: URL: https://github.com/apache/spark/pull/37133#discussion_r917217960 ## R/pkg/R/catalog.R: ## @@ -403,21 +406,95 @@ listTables <- function(databaseName = NULL) { dataFrame(callJMethod(jdst, "toDF")) } +#' Checks if the table

[GitHub] [spark] AmplabJenkins commented on pull request #37117: [WIP][SPARK-39714][python] Try to fix the mypy annotation tests

2022-07-08 Thread GitBox
AmplabJenkins commented on PR #37117: URL: https://github.com/apache/spark/pull/37117#issuecomment-1179467008 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] beliefer commented on a diff in pull request #37040: [SPARK-39651][SQL] Prune filter condition if compare with rand is deterministic

2022-07-08 Thread GitBox
beliefer commented on code in PR #37040: URL: https://github.com/apache/spark/pull/37040#discussion_r917211765 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala: ## @@ -470,6 +470,52 @@ object BooleanSimplification extends

[GitHub] [spark] zhengruifeng commented on pull request #37133: [SPARK-39720][R] Implement tableExists in SparkR for 3L namespace

2022-07-08 Thread GitBox
zhengruifeng commented on PR #37133: URL: https://github.com/apache/spark/pull/37133#issuecomment-1179463484 cc @HyukjinKwon mind take another look? thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] beliefer commented on pull request #37126: [SPARK-39385][SQL] Supports push down `REGR_AVGX` and `REGR_AVGY`

2022-07-08 Thread GitBox
beliefer commented on PR #37126: URL: https://github.com/apache/spark/pull/37126#issuecomment-1179458987 ping @huaxingao cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] beliefer commented on pull request #37137: [SPARK-37623][SPARK-39230][SQL][FOLLOW-UP] Make regr_slope and regr_intercept safe with ANSI mode

2022-07-08 Thread GitBox
beliefer commented on PR #37137: URL: https://github.com/apache/spark/pull/37137#issuecomment-1179457241 LGTM. @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] beliefer commented on a diff in pull request #37137: [SPARK-37623][SPARK-39230][SQL][FOLLOW-UP] Make regr_slope and regr_intercept safe with ANSI mode

2022-07-08 Thread GitBox
beliefer commented on code in PR #37137: URL: https://github.com/apache/spark/pull/37137#discussion_r917205859 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/linearRegression.scala: ## @@ -278,7 +278,8 @@ case class RegrSlope(left:

[GitHub] [spark] allisonwang-db commented on a diff in pull request #37099: [SPARK-37287][SQL] Pull out dynamic partition and bucket sort from FileFormatWriter

2022-07-08 Thread GitBox
allisonwang-db commented on code in PR #37099: URL: https://github.com/apache/spark/pull/37099#discussion_r917189373 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala: ## @@ -211,6 +180,7 @@ object FileFormatWriter extends Logging {

[GitHub] [spark] allisonwang-db commented on a diff in pull request #37099: [SPARK-37287][SQL] Pull out dynamic partition and bucket sort from FileFormatWriter

2022-07-08 Thread GitBox
allisonwang-db commented on code in PR #37099: URL: https://github.com/apache/spark/pull/37099#discussion_r917189063 ## sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala: ## @@ -141,7 +143,19 @@ case class

[GitHub] [spark] AmplabJenkins commented on pull request #37123: [SPARK-39711][TESTS] Remove redundant trait: BeforeAndAfterAll & BeforeAndAfterEach & Logging

2022-07-08 Thread GitBox
AmplabJenkins commented on PR #37123: URL: https://github.com/apache/spark/pull/37123#issuecomment-1179431058 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] allisonwang-db commented on a diff in pull request #37099: [SPARK-37287][SQL] Pull out dynamic partition and bucket sort from FileFormatWriter

2022-07-08 Thread GitBox
allisonwang-db commented on code in PR #37099: URL: https://github.com/apache/spark/pull/37099#discussion_r917188294 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -3781,6 +3781,14 @@ object SQLConf { .intConf .createWithDefault(0)

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37137: [SPARK-37623][SPARK-39230][SQL][FOLLOW-UP] Make regr_slope and regr_intercept safe with ANSI mode

2022-07-08 Thread GitBox
HyukjinKwon commented on code in PR #37137: URL: https://github.com/apache/spark/pull/37137#discussion_r917171663 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/linearRegression.scala: ## @@ -278,7 +278,8 @@ case class RegrSlope(left:

[GitHub] [spark] dongjoon-hyun commented on pull request #37140: [SPARK-39726][SQL] Change the default value of spark.sql.execution.topKSortFallbackThreshold to 800000

2022-07-08 Thread GitBox
dongjoon-hyun commented on PR #37140: URL: https://github.com/apache/spark/pull/37140#issuecomment-1179395790 cc @cloud-fan and @JoshRosen based on the review history on the previous PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #37140: [SPARK-39726][SQL] Change the default value of spark.sql.execution.topKSortFallbackThreshold to 800000

2022-07-08 Thread GitBox
dongjoon-hyun commented on code in PR #37140: URL: https://github.com/apache/spark/pull/37140#discussion_r917161583 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -3029,7 +3029,10 @@ object SQLConf { " in memory, otherwise do a

[GitHub] [spark] xinrong-databricks commented on a diff in pull request #36893: [SPARK-39494][PYTHON] Support `createDataFrame` from a list of scalars

2022-07-08 Thread GitBox
xinrong-databricks commented on code in PR #36893: URL: https://github.com/apache/spark/pull/36893#discussion_r917160564 ## python/pyspark/sql/session.py: ## @@ -1023,6 +1023,12 @@ def prepare(obj: Any) -> Any: if isinstance(data, RDD): rdd, struct =

[GitHub] [spark] dongjoon-hyun commented on pull request #36696: [SPARK-39312][SQL] Use parquet native In predicate for in filter push down

2022-07-08 Thread GitBox
dongjoon-hyun commented on PR #36696: URL: https://github.com/apache/spark/pull/36696#issuecomment-1179338350 Gentle ping, @huaxingao . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] bjornjorgensen commented on pull request #37142: [SPARK-39725][BUILD] Upgrade `jetty-http` from 9.4.46.v20220331 to 9.4.48.v20220622

2022-07-08 Thread GitBox
bjornjorgensen commented on PR #37142: URL: https://github.com/apache/spark/pull/37142#issuecomment-1179288037 Note: [Jetty 9.4.x is now at End of Community Support.](https://github.com/eclipse/jetty.project/issues/7958) We can't upgrade to Jetty 10.x because we are using

[GitHub] [spark] kamcheungting-db commented on a diff in pull request #37104: [SPARK-39698][SQL] Use `TakeOrderedAndProject` if maxRows below the topKSortFallbackThreshold

2022-07-08 Thread GitBox
kamcheungting-db commented on code in PR #37104: URL: https://github.com/apache/spark/pull/37104#discussion_r917070033 ## sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala: ## @@ -283,20 +283,7 @@ case class TakeOrderedAndProjectExec( } override def

[GitHub] [spark] bjornjorgensen opened a new pull request, #37142: [WIP][SPARK-39725][BUILD] Upgrade jetty-http from 9.4.46.v20220331 to 9.4.48.v20220622

2022-07-08 Thread GitBox
bjornjorgensen opened a new pull request, #37142: URL: https://github.com/apache/spark/pull/37142 ### What changes were proposed in this pull request? Upgrade jetty-http from 9.4.46.v20220331 to 9.4.48.v20220622 ### Why are the changes needed? [Release note

[GitHub] [spark] srowen commented on pull request #37031: [SPARK-39639][SQL] Fix possible null pointer in MySQLDialect listIndexes

2022-07-08 Thread GitBox
srowen commented on PR #37031: URL: https://github.com/apache/spark/pull/37031#issuecomment-1179169196 @panbingkun can you resolve the conflict? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] abhishekd0907 opened a new pull request, #37141: [SPARK-39024] Notify External Shuffle Service when a Yarn Sends a Node in Decommissioning State

2022-07-08 Thread GitBox
abhishekd0907 opened a new pull request, #37141: URL: https://github.com/apache/spark/pull/37141 ### What changes were proposed in this pull request? Notify External Shuffle Service when a Yarn Sends a Node in Decommissioning State and all shuffle data is migrated. ###

[GitHub] [spark] maryannxue commented on pull request #37098: [SPARK-39690][SQL] Fixes Reuse exchange across subqueries with AQE if subquery side exchange materialized first

2022-07-08 Thread GitBox
maryannxue commented on PR #37098: URL: https://github.com/apache/spark/pull/37098#issuecomment-1179151926 @mskapilks Thank you for the PR! Could you please give a concrete query example? I know it might be hard to write a test, but an example in the PR description would be very helpful.

[GitHub] [spark] wangyum closed pull request #37118: [SPARK-39709][SQL] The result of executeCollect and doExecute of TakeOrderedAndProjectExec should be the same

2022-07-08 Thread GitBox
wangyum closed pull request #37118: [SPARK-39709][SQL] The result of executeCollect and doExecute of TakeOrderedAndProjectExec should be the same URL: https://github.com/apache/spark/pull/37118 -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] wangyum commented on pull request #37118: [SPARK-39709][SQL] The result of executeCollect and doExecute of TakeOrderedAndProjectExec should be the same

2022-07-08 Thread GitBox
wangyum commented on PR #37118: URL: https://github.com/apache/spark/pull/37118#issuecomment-1179106139 Thank you @JoshRosen. Make sense. I will close this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] wangyum commented on a diff in pull request #37118: [SPARK-39709][SQL] The result of executeCollect and doExecute of TakeOrderedAndProjectExec should be the same

2022-07-08 Thread GitBox
wangyum commented on code in PR #37118: URL: https://github.com/apache/spark/pull/37118#discussion_r916916958 ## sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala: ## @@ -283,20 +283,7 @@ case class TakeOrderedAndProjectExec( } override def

[GitHub] [spark] wangyum commented on a diff in pull request #37104: [SPARK-39698][SQL] Use `TakeOrderedAndProject` if maxRows below the topKSortFallbackThreshold

2022-07-08 Thread GitBox
wangyum commented on code in PR #37104: URL: https://github.com/apache/spark/pull/37104#discussion_r916898129 ## sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala: ## @@ -283,20 +283,7 @@ case class TakeOrderedAndProjectExec( } override def

[GitHub] [spark] wangyum commented on a diff in pull request #37104: [SPARK-39698][SQL] Use `TakeOrderedAndProject` if maxRows below the topKSortFallbackThreshold

2022-07-08 Thread GitBox
wangyum commented on code in PR #37104: URL: https://github.com/apache/spark/pull/37104#discussion_r916897153 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -3024,12 +3024,15 @@ object SQLConf { val TOP_K_SORT_FALLBACK_THRESHOLD =

[GitHub] [spark] tgravescs commented on pull request #37119: [SPARK-38910][YARN][FOLLOWUP] Unmanaged AM should clean staging dir before unregister

2022-07-08 Thread GitBox
tgravescs commented on PR #37119: URL: https://github.com/apache/spark/pull/37119#issuecomment-1179080739 -1 because other pr should be reverted -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] tgravescs commented on pull request #36207: [SPARK-38910][YARN] Clean spark staging before `unregister`

2022-07-08 Thread GitBox
tgravescs commented on PR #36207: URL: https://github.com/apache/spark/pull/36207#issuecomment-1179080164 Please put up pr to revert this. I’m out for a week or so so might not be able to review. @srowen @AngersZh -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] wangyum opened a new pull request, #37140: [SPARK-39726][SQL] Change the default value of spark.sql.execution.topKSortFallbackThreshold to 800000

2022-07-08 Thread GitBox
wangyum opened a new pull request, #37140: URL: https://github.com/apache/spark/pull/37140 ### What changes were proposed in this pull request? This PR changes the default value of `spark.sql.execution.topKSortFallbackThreshold` to 80 base on benchmark. Benchmark code:

[GitHub] [spark] aray commented on pull request #37053: [SPARK-39452][GraphX] Extend EdgePartition1D with Destination based Strategy

2022-07-08 Thread GitBox
aray commented on PR #37053: URL: https://github.com/apache/spark/pull/37053#issuecomment-1179076530 Thanks for the performance discussion, sounds like a good improvement for this use case. -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] huaxingao commented on a diff in pull request #37080: [SPARK-35208][SQL][DOCS] Add docs for LATERAL subqueries

2022-07-08 Thread GitBox
huaxingao commented on code in PR #37080: URL: https://github.com/apache/spark/pull/37080#discussion_r916892032 ## docs/sql-ref-syntax-qry-select-lateral-subquery.md: ## @@ -0,0 +1,87 @@ +--- +layout: global +title: LATERAL SUBQUERY +displayTitle: LATERAL SUBQUERY +license: | +

[GitHub] [spark] ericsun95 commented on pull request #37053: [SPARK-39452][GraphX] Extend EdgePartition1D with Destination based Strategy

2022-07-08 Thread GitBox
ericsun95 commented on PR #37053: URL: https://github.com/apache/spark/pull/37053#issuecomment-1179057359 > Thanks for the PR. Can you help justify that this is better than using EdgePartition2D for those use cases? As to the change, I'm personally not in favor of deprecating

[GitHub] [spark] srowen commented on a diff in pull request #37137: [SPARK-37623][SPARK-39230][SQL][FOLLOW-UP] Make regr_slope and regr_intercept safe with ANSI mode

2022-07-08 Thread GitBox
srowen commented on code in PR #37137: URL: https://github.com/apache/spark/pull/37137#discussion_r916811045 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/linearRegression.scala: ## @@ -278,7 +278,8 @@ case class RegrSlope(left: Expression,

[GitHub] [spark] srowen commented on pull request #37008: [SPARK-39620][Web UI] Use same condition in history server page and API to filter applications

2022-07-08 Thread GitBox
srowen commented on PR #37008: URL: https://github.com/apache/spark/pull/37008#issuecomment-1178982400 Merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] srowen closed pull request #37008: [SPARK-39620][Web UI] Use same condition in history server page and API to filter applications

2022-07-08 Thread GitBox
srowen closed pull request #37008: [SPARK-39620][Web UI] Use same condition in history server page and API to filter applications URL: https://github.com/apache/spark/pull/37008 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] aray commented on pull request #36806: [SPARK-39398][GRAPHX]message checkpointer support storage level

2022-07-08 Thread GitBox
aray commented on PR #36806: URL: https://github.com/apache/spark/pull/36806#issuecomment-1178981888 This looks reasonable to me -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] srowen commented on pull request #37058: [SPARK-39661][SQL] Avoid creating unnecessary SLF4J Logger

2022-07-08 Thread GitBox
srowen commented on PR #37058: URL: https://github.com/apache/spark/pull/37058#issuecomment-1178981149 Merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] srowen closed pull request #37058: [SPARK-39661][SQL] Avoid creating unnecessary SLF4J Logger

2022-07-08 Thread GitBox
srowen closed pull request #37058: [SPARK-39661][SQL] Avoid creating unnecessary SLF4J Logger URL: https://github.com/apache/spark/pull/37058 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] izchen commented on pull request #36768: [SPARK-39380][SQL] Ignore comment syntax in dfs command

2022-07-08 Thread GitBox
izchen commented on PR #36768: URL: https://github.com/apache/spark/pull/36768#issuecomment-1178975968 @maropu @yaooqinn , could you help to review this PR? :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] cloud-fan commented on pull request #36505: [SPARK-39131][SQL] Rewrite exists as LeftSemi earlier to allow filters to be inferred

2022-07-08 Thread GitBox
cloud-fan commented on PR #36505: URL: https://github.com/apache/spark/pull/36505#issuecomment-1178972459 I think ultimately we should rewrite correlated subqueries to joins at the very beginning of the optimizer. Also cc @allisonwang-db -- This is an automated message from the Apache

[GitHub] [spark] cloud-fan commented on a diff in pull request #37137: [SPARK-37623][SPARK-39230][SQL][FOLLOW-UP] Make regr_slope and regr_intercept safe with ANSI mode

2022-07-08 Thread GitBox
cloud-fan commented on code in PR #37137: URL: https://github.com/apache/spark/pull/37137#discussion_r916796035 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/linearRegression.scala: ## @@ -278,7 +278,8 @@ case class RegrSlope(left:

[GitHub] [spark] aray commented on pull request #37053: [SPARK-39452][GraphX] Extend EdgePartition1D with Destination based Strategy

2022-07-08 Thread GitBox
aray commented on PR #37053: URL: https://github.com/apache/spark/pull/37053#issuecomment-1178966220 Thanks for the PR. Can you help justify that this is better than using EdgePartition2D for those use cases? As to the change, I'm personally not in favor of deprecating EdgePartition1D just

[GitHub] [spark] Yikun commented on pull request #36353: [SPARK-38946][PYTHON][PS] Generates a new dataframe instead of operating inplace in setitem

2022-07-08 Thread GitBox
Yikun commented on PR #36353: URL: https://github.com/apache/spark/pull/36353#issuecomment-1178956945 Let me do a brief conclusion to help review: - **Change `setitem` to make a copy**https://github.com/pandas-dev/pandas/commit/03dd698bc1e84c35aba8b51bdd45c472860b9ec3 dataframe.setitem

[GitHub] [spark] zhengruifeng commented on pull request #37133: [SPARK-39720][R] Make createTable/cacheTable/uncacheTable/refreshTable/tableExists in SparkR support 3L namespace

2022-07-08 Thread GitBox
zhengruifeng commented on PR #37133: URL: https://github.com/apache/spark/pull/37133#issuecomment-1178946160 @HyukjinKwon OK, what about 'Make Sure ...' or just 'Implement tableExists in SparkR ...' let me also update methods `recoverPartitions` and `listColumns` in this PR. --

[GitHub] [spark] Yikun closed pull request #37139: [DNM] R only job test

2022-07-08 Thread GitBox
Yikun closed pull request #37139: [DNM] R only job test URL: https://github.com/apache/spark/pull/37139 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] zhengruifeng closed pull request #37134: [SPARK-39721][R] Make recoverPartitions/listColumns in SparkR support 3L namespace

2022-07-08 Thread GitBox
zhengruifeng closed pull request #37134: [SPARK-39721][R] Make recoverPartitions/listColumns in SparkR support 3L namespace URL: https://github.com/apache/spark/pull/37134 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] zhengruifeng commented on pull request #37132: [SPARK-39719][R] Make databaseExists/listTables/tables/tableNames in SparkR support 3L namespace

2022-07-08 Thread GitBox
zhengruifeng commented on PR #37132: URL: https://github.com/apache/spark/pull/37132#issuecomment-1178937651 R test cases are not isolated, and may interference each other ... moreover, it seems that we can not test single UT or mudule in R -- This is an automated message from the

[GitHub] [spark] Yikun commented on pull request #37139: [DNM] R only job test

2022-07-08 Thread GitBox
Yikun commented on PR #37139: URL: https://github.com/apache/spark/pull/37139#issuecomment-1178937436 https://user-images.githubusercontent.com/1736354/177992999-83d6bcf8-692c-4a45-afbd-7e649f27a29c.png;> -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] HyukjinKwon commented on pull request #37133: [SPARK-39720][R] Make createTable/cacheTable/uncacheTable/refreshTable/tableExists in SparkR support 3L namespace

2022-07-08 Thread GitBox
HyukjinKwon commented on PR #37133: URL: https://github.com/apache/spark/pull/37133#issuecomment-1178935238 This Pr seems only adding `tableExists`. Should we update the PR title? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] zhengruifeng commented on pull request #37133: [SPARK-39720][R] Make createTable/cacheTable/uncacheTable/refreshTable/tableExists in SparkR support 3L namespace

2022-07-08 Thread GitBox
zhengruifeng commented on PR #37133: URL: https://github.com/apache/spark/pull/37133#issuecomment-1178934037 cc @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on pull request #37138: [SPARK-39718][INFRA][FOLLOWUP] Remove redundant wrap in parts of the condition

2022-07-08 Thread GitBox
HyukjinKwon commented on PR #37138: URL: https://github.com/apache/spark/pull/37138#issuecomment-1178930658 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon closed pull request #37138: [SPARK-39718][INFRA][FOLLOWUP] Remove redundant wrap in parts of the condition

2022-07-08 Thread GitBox
HyukjinKwon closed pull request #37138: [SPARK-39718][INFRA][FOLLOWUP] Remove redundant wrap in parts of the condition URL: https://github.com/apache/spark/pull/37138 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] Yikun commented on pull request #37138: [SPARK-39718][INFRA][FOLLOWUP] Remove redundant wrap in parts of the condition

2022-07-08 Thread GitBox
Yikun commented on PR #37138: URL: https://github.com/apache/spark/pull/37138#issuecomment-1178925817 cc @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] Yikun opened a new pull request, #37138: [SPARK-39718][INFRA][FOLLOWUP] Remove redundant wrap in parts of the condition

2022-07-08 Thread GitBox
Yikun opened a new pull request, #37138: URL: https://github.com/apache/spark/pull/37138 ### What changes were proposed in this pull request? Remove wrap in parts of the condition to infra image job avoid always execute ### Why are the changes needed? Infra image job should

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36773: [SPARK-39385][SQL] Translate linear regression aggregate functions for pushdown

2022-07-08 Thread GitBox
HyukjinKwon commented on code in PR #36773: URL: https://github.com/apache/spark/pull/36773#discussion_r916752602 ## sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala: ## @@ -1685,6 +1709,42 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with

[GitHub] [spark] HyukjinKwon closed pull request #37131: [SPARK-39718][INFRA] Add name for infra-image and cleanup nits

2022-07-08 Thread GitBox
HyukjinKwon closed pull request #37131: [SPARK-39718][INFRA] Add name for infra-image and cleanup nits URL: https://github.com/apache/spark/pull/37131 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] HyukjinKwon commented on pull request #37131: [SPARK-39718][INFRA] Add name for infra-image and cleanup nits

2022-07-08 Thread GitBox
HyukjinKwon commented on PR #37131: URL: https://github.com/apache/spark/pull/37131#issuecomment-1178912438 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon opened a new pull request, #37137: [SPARK-37623][SPARK-39230][SQL][FOLLOW-UP] Make regr_slope and regr_intercept safe with ANSI mode

2022-07-08 Thread GitBox
HyukjinKwon opened a new pull request, #37137: URL: https://github.com/apache/spark/pull/37137 ### What changes were proposed in this pull request? This PR proposes to make `regr_slope` and `regr_intercept` ANSI-safe by checking zero to avoid divide-by-zero exceptions when ANSI mode

[GitHub] [spark] LuciferYang opened a new pull request, #37136: [SPARK-39724][CORE] Remove duplicate `.setAccessible(true)` call in `kvstore.KVTypeInfo`

2022-07-08 Thread GitBox
LuciferYang opened a new pull request, #37136: URL: https://github.com/apache/spark/pull/37136 ### What changes were proposed in this pull request? This pr just remove duplicate `.setAccessible(true)` call in `kvstore.KVTypeInfo`. ### Why are the changes needed? Delete

[GitHub] [spark] zhengruifeng opened a new pull request, #37135: [SPARK-39723][R] Make ListFunctions/functionExists in SparkR support 3L namespace

2022-07-08 Thread GitBox
zhengruifeng opened a new pull request, #37135: URL: https://github.com/apache/spark/pull/37135 ### What changes were proposed in this pull request? Make ListFunctions/functionExists in SparkR support 3L namespace ### Why are the changes needed? for 3L namespace

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36773: [SPARK-39385][SQL] Translate linear regression aggregate functions for pushdown

2022-07-08 Thread GitBox
HyukjinKwon commented on code in PR #36773: URL: https://github.com/apache/spark/pull/36773#discussion_r916719849 ## sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala: ## @@ -1685,6 +1709,42 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with

[GitHub] [spark] zhengruifeng opened a new pull request, #37134: [SPARK-39721][R] Make recoverPartitions/listColumns in SparkR support 3L namespace

2022-07-08 Thread GitBox
zhengruifeng opened a new pull request, #37134: URL: https://github.com/apache/spark/pull/37134 ### What changes were proposed in this pull request? Make recoverPartitions/listColumns in SparkR support 3L namespace ### Why are the changes needed? for 3L namespace

[GitHub] [spark] zhengruifeng opened a new pull request, #37133: [SPARK-39720][R] Make createTable/cacheTable/uncacheTable/refreshTable/tableExists in SparkR support 3L namespace

2022-07-08 Thread GitBox
zhengruifeng opened a new pull request, #37133: URL: https://github.com/apache/spark/pull/37133 ### What changes were proposed in this pull request? Make createTable/cacheTable/uncacheTable/refreshTable/tableExists in SparkR support 3L namespace ### Why are the changes needed?

[GitHub] [spark] pan3793 commented on a diff in pull request #36995: [SPARK-39607][SQL][DSV2] Distribution and ordering support V2 function in writing

2022-07-08 Thread GitBox
pan3793 commented on code in PR #36995: URL: https://github.com/apache/spark/pull/36995#discussion_r916672796 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DistributionAndOrderingUtils.scala: ## @@ -53,16 +63,27 @@ object DistributionAndOrderingUtils

[GitHub] [spark] pan3793 commented on a diff in pull request #36995: [SPARK-39607][SQL][DSV2] Distribution and ordering support V2 function in writing

2022-07-08 Thread GitBox
pan3793 commented on code in PR #36995: URL: https://github.com/apache/spark/pull/36995#discussion_r916672796 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DistributionAndOrderingUtils.scala: ## @@ -53,16 +63,27 @@ object DistributionAndOrderingUtils

[GitHub] [spark] zhengruifeng commented on pull request #37127: [SPARK-39716][R] Make currentDatabase/setCurrentDatabase/listCatalogs in SparkR support 3L namespace

2022-07-08 Thread GitBox
zhengruifeng commented on PR #37127: URL: https://github.com/apache/spark/pull/37127#issuecomment-1178802894 merged to master, thank you @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] zhengruifeng closed pull request #37127: [SPARK-39716][R] Make currentDatabase/setCurrentDatabase/listCatalogs in SparkR support 3L namespace

2022-07-08 Thread GitBox
zhengruifeng closed pull request #37127: [SPARK-39716][R] Make currentDatabase/setCurrentDatabase/listCatalogs in SparkR support 3L namespace URL: https://github.com/apache/spark/pull/37127 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] zhengruifeng commented on a diff in pull request #37132: [SPARK-39719][R] Make databaseExists/listTables/tables/tableNames in SparkR support 3L namespace

2022-07-08 Thread GitBox
zhengruifeng commented on code in PR #37132: URL: https://github.com/apache/spark/pull/37132#discussion_r916647323 ## sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala: ## @@ -217,7 +217,7 @@ private[sql] object SQLUtils extends Logging { case _ =>

[GitHub] [spark] zhengruifeng opened a new pull request, #37132: [SPARK-39719][R] Make databaseExists/listTables/tables/tableNames in SparkR support 3L namespace

2022-07-08 Thread GitBox
zhengruifeng opened a new pull request, #37132: URL: https://github.com/apache/spark/pull/37132 ### What changes were proposed in this pull request? 1, add `databaseExists` 2, make sure `listTables` support 3L namespace 3, modify sparkR-specific catalog method `tables` and

[GitHub] [spark] Yikun commented on pull request #37117: [WIP][SPARK-39714][python] Try to fix the mypy annotation tests

2022-07-08 Thread GitBox
Yikun commented on PR #37117: URL: https://github.com/apache/spark/pull/37117#issuecomment-1178769601 @bzhaoopenstack Thanks for contributions! 1. Rename `[SPARK-39714][python]` to `[SPARK-39714][PYTHON]` : ) 2. It would be good you can fullfill the PR description even this is a

[GitHub] [spark] Yikun commented on a diff in pull request #37131: [SPARK-39718][INFRA] Add name for infra-image and cleanup nits

2022-07-08 Thread GitBox
Yikun commented on code in PR #37131: URL: https://github.com/apache/spark/pull/37131#discussion_r916629103 ## .github/workflows/build_infra_images_cache.yml: ## @@ -31,26 +31,21 @@ jobs: if: github.repository == 'apache/spark' runs-on: ubuntu-latest steps: -

[GitHub] [spark] Yikun commented on a diff in pull request #37131: [SPARK-39718][INFRA] Add name for infra-image and cleanup nits

2022-07-08 Thread GitBox
Yikun commented on code in PR #37131: URL: https://github.com/apache/spark/pull/37131#discussion_r916624102 ## .github/workflows/build_infra_images_cache.yml: ## @@ -31,26 +31,21 @@ jobs: if: github.repository == 'apache/spark' runs-on: ubuntu-latest steps: -

[GitHub] [spark] Yikun commented on a diff in pull request #37005: [SPARK-39522][INFRA]Uses Docker image cache over a custom image in pyspark job

2022-07-08 Thread GitBox
Yikun commented on code in PR #37005: URL: https://github.com/apache/spark/pull/37005#discussion_r916623007 ## .github/workflows/build_and_test.yml: ## @@ -251,13 +251,73 @@ jobs: name: unit-tests-log-${{ matrix.modules }}-${{ matrix.comment }}-${{ matrix.java }}-${{

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36773: [SPARK-39385][SQL] Translate linear regression aggregate functions for pushdown

2022-07-08 Thread GitBox
HyukjinKwon commented on code in PR #36773: URL: https://github.com/apache/spark/pull/36773#discussion_r916608215 ## sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala: ## @@ -1685,6 +1709,42 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36773: [SPARK-39385][SQL] Translate linear regression aggregate functions for pushdown

2022-07-08 Thread GitBox
HyukjinKwon commented on code in PR #36773: URL: https://github.com/apache/spark/pull/36773#discussion_r916607535 ## sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala: ## @@ -1685,6 +1709,42 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with

[GitHub] [spark] Yikun opened a new pull request, #37131: [SPARK-39718][INFRA] Add name for infra-image and cleanup nits and fix the inline format

2022-07-08 Thread GitBox
Yikun opened a new pull request, #37131: URL: https://github.com/apache/spark/pull/37131 ### What changes were proposed in this pull request? Add name for infra-image and cleanup nits and fix the inline format ### Why are the changes needed? Address comments in

[GitHub] [spark] cloud-fan commented on pull request #37083: [SPARK-39678][SQL] Improve stats estimation for v2 tables

2022-07-08 Thread GitBox
cloud-fan commented on PR #37083: URL: https://github.com/apache/spark/pull/37083#issuecomment-1178723347 OK I think the idea makes sense. With CBO off, the optimizer/planner only needs size in bytes, but row count is also an important statistics to estimate size in bytes, and should be

[GitHub] [spark] HyukjinKwon commented on pull request #37130: [SPARK-39522][INFRA][FOLLOWUP] Rename infra cache image job to `Build / Cache base image`

2022-07-08 Thread GitBox
HyukjinKwon commented on PR #37130: URL: https://github.com/apache/spark/pull/37130#issuecomment-1178711948 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon closed pull request #37130: [SPARK-39522][INFRA][FOLLOWUP] Rename infra cache image job to `Build / Cache base image`

2022-07-08 Thread GitBox
HyukjinKwon closed pull request #37130: [SPARK-39522][INFRA][FOLLOWUP] Rename infra cache image job to `Build / Cache base image` URL: https://github.com/apache/spark/pull/37130 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37005: [SPARK-39522][INFRA]Uses Docker image cache over a custom image in pyspark job

2022-07-08 Thread GitBox
HyukjinKwon commented on code in PR #37005: URL: https://github.com/apache/spark/pull/37005#discussion_r916590637 ## .github/workflows/build_and_test.yml: ## @@ -251,13 +251,73 @@ jobs: name: unit-tests-log-${{ matrix.modules }}-${{ matrix.comment }}-${{ matrix.java

[GitHub] [spark] Yikun commented on a diff in pull request #37003: [SPARK-39522][INFRA] Add Apache Spark infra GA image cache

2022-07-08 Thread GitBox
Yikun commented on code in PR #37003: URL: https://github.com/apache/spark/pull/37003#discussion_r916590142 ## .github/workflows/build_infra_images_cache.yml: ## @@ -0,0 +1,63 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license

[GitHub] [spark] Yikun opened a new pull request, #37130: [SPARK-39522][INFRA][FOLLOWUP] Rename infra cache image job to `Build / Cache base image`

2022-07-08 Thread GitBox
Yikun opened a new pull request, #37130: URL: https://github.com/apache/spark/pull/37130 ### What changes were proposed in this pull request? Rename infra cache image job to `Build / Cache base image` https://github.com/apache/spark/pull/37003#discussion_r916576433 ### Why

[GitHub] [spark] Yikun commented on a diff in pull request #37005: [SPARK-39522][INFRA]Uses Docker image cache over a custom image in pyspark job

2022-07-08 Thread GitBox
Yikun commented on code in PR #37005: URL: https://github.com/apache/spark/pull/37005#discussion_r916579584 ## .github/workflows/build_and_test.yml: ## @@ -251,13 +251,73 @@ jobs: name: unit-tests-log-${{ matrix.modules }}-${{ matrix.comment }}-${{ matrix.java }}-${{

[GitHub] [spark] Yikun commented on a diff in pull request #37005: [SPARK-39522][INFRA]Uses Docker image cache over a custom image in pyspark job

2022-07-08 Thread GitBox
Yikun commented on code in PR #37005: URL: https://github.com/apache/spark/pull/37005#discussion_r916579584 ## .github/workflows/build_and_test.yml: ## @@ -251,13 +251,73 @@ jobs: name: unit-tests-log-${{ matrix.modules }}-${{ matrix.comment }}-${{ matrix.java }}-${{

[GitHub] [spark] cloud-fan commented on a diff in pull request #37014: [SPARK-39624][SQL] Support coalesce partition through CartesianProduct

2022-07-08 Thread GitBox
cloud-fan commented on code in PR #37014: URL: https://github.com/apache/spark/pull/37014#discussion_r916577151 ## sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala: ## @@ -2602,6 +2602,56 @@ class AdaptiveQueryExecSuite

[GitHub] [spark] singhpk234 commented on pull request #37083: [SPARK-39678][SQL] Improve stats estimation for v2 tables

2022-07-08 Thread GitBox
singhpk234 commented on PR #37083: URL: https://github.com/apache/spark/pull/37083#issuecomment-1178695254 > BTW, with CBO off, where do we use row count? we use it in places like :

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37003: [SPARK-39522][INFRA] Add Apache Spark infra GA image cache

2022-07-08 Thread GitBox
HyukjinKwon commented on code in PR #37003: URL: https://github.com/apache/spark/pull/37003#discussion_r916576433 ## .github/workflows/build_infra_images_cache.yml: ## @@ -0,0 +1,63 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor

[GitHub] [spark] cloud-fan commented on a diff in pull request #37014: [SPARK-39624][SQL] Support coalesce partition through CartesianProduct

2022-07-08 Thread GitBox
cloud-fan commented on code in PR #37014: URL: https://github.com/apache/spark/pull/37014#discussion_r916576320 ## sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala: ## @@ -2602,6 +2602,56 @@ class AdaptiveQueryExecSuite

[GitHub] [spark] HyukjinKwon commented on pull request #37005: [SPARK-39522][INFRA]Uses Docker image cache over a custom image in pyspark job

2022-07-08 Thread GitBox
HyukjinKwon commented on PR #37005: URL: https://github.com/apache/spark/pull/37005#issuecomment-1178692367 Feel free to make a followup or a separate PR with a separate JIRA  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37005: [SPARK-39522][INFRA]Uses Docker image cache over a custom image in pyspark job

2022-07-08 Thread GitBox
HyukjinKwon commented on code in PR #37005: URL: https://github.com/apache/spark/pull/37005#discussion_r916573997 ## .github/workflows/build_and_test.yml: ## @@ -251,13 +251,73 @@ jobs: name: unit-tests-log-${{ matrix.modules }}-${{ matrix.comment }}-${{ matrix.java

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37005: [SPARK-39522][INFRA]Uses Docker image cache over a custom image in pyspark job

2022-07-08 Thread GitBox
HyukjinKwon commented on code in PR #37005: URL: https://github.com/apache/spark/pull/37005#discussion_r916573418 ## .github/workflows/build_and_test.yml: ## @@ -251,13 +251,73 @@ jobs: name: unit-tests-log-${{ matrix.modules }}-${{ matrix.comment }}-${{ matrix.java

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37005: [SPARK-39522][INFRA]Uses Docker image cache over a custom image in pyspark job

2022-07-08 Thread GitBox
HyukjinKwon commented on code in PR #37005: URL: https://github.com/apache/spark/pull/37005#discussion_r916573072 ## .github/workflows/build_and_test.yml: ## @@ -251,13 +251,73 @@ jobs: name: unit-tests-log-${{ matrix.modules }}-${{ matrix.comment }}-${{ matrix.java

  1   2   >