[GitHub] [spark] xwu99 commented on pull request #33941: [SPARK-36699][Core] Reuse compatible executors for stage-level scheduling

2022-04-19 Thread GitBox
xwu99 commented on PR #33941: URL: https://github.com/apache/spark/pull/33941#issuecomment-1103533768 > Support a basic reuse policy (perhaps EXEC_CORES_EQUAL since I think that was your original use case) and allow user to specify their own. ie config that perhaps load all policies, like t

[GitHub] [spark] LuciferYang commented on a diff in pull request #36237: [SPARK-38896][CORE][SQL] Use `tryWithResource` release `LevelDB/RocksDBIterator` resources earlier

2022-04-19 Thread GitBox
LuciferYang commented on code in PR #36237: URL: https://github.com/apache/spark/pull/36237#discussion_r853779042 ## core/src/main/scala/org/apache/spark/status/KVUtils.scala: ## @@ -100,6 +100,33 @@ private[spark] object KVUtils extends Logging { } } + /** Counts the

[GitHub] [spark] yaooqinn commented on pull request #36222: [SPARK-38922][Core] TaskLocation.apply throw NullPointerException

2022-04-19 Thread GitBox
yaooqinn commented on PR #36222: URL: https://github.com/apache/spark/pull/36222#issuecomment-1103527220 thanks, @srowen @mridulm @HyukjinKwon for the check, merged to master/3.3/3.2/3.1/3.0 -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] bkosaraju commented on pull request #24801: [SPARK-27950][DSTREAMS][Kinesis] dynamoDBEndpointUrl and cloudWatchMetricsLevel for Kinesis

2022-04-19 Thread GitBox
bkosaraju commented on PR #24801: URL: https://github.com/apache/spark/pull/24801#issuecomment-1103526132 @etspaceman Can this PR be merged ? Thanks, as this can unblock some of local and integration testing pieces for kinesis streaming. -- This is an automated message from the Apache Git

[GitHub] [spark] yaooqinn closed pull request #36222: [SPARK-38922][Core] TaskLocation.apply throw NullPointerException

2022-04-19 Thread GitBox
yaooqinn closed pull request #36222: [SPARK-38922][Core] TaskLocation.apply throw NullPointerException URL: https://github.com/apache/spark/pull/36222 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] xwu99 commented on a diff in pull request #33941: [SPARK-36699][Core] Reuse compatible executors for stage-level scheduling

2022-04-19 Thread GitBox
xwu99 commented on code in PR #33941: URL: https://github.com/apache/spark/pull/33941#discussion_r853770991 ## core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala: ## @@ -283,8 +283,9 @@ case class SparkListenerApplicationEnd(time: Long) extends SparkListenerEven

[GitHub] [spark] PavithraRamachandran opened a new pull request, #36278: [SPARK-38963][WEBUI]Make stage navigable to stage Page from max metrics displayed in UI

2022-04-19 Thread GitBox
PavithraRamachandran opened a new pull request, #36278: URL: https://github.com/apache/spark/pull/36278 ### What changes were proposed in this pull request? Making stage ID which is currently displayed as a static text in DAG into a navigable link to the particular stage Page.

[GitHub] [spark] cloud-fan commented on a diff in pull request #36238: [SPARK-38916][CORE] Tasks not killed caused by race conditions between killTask() and launchTask()

2022-04-19 Thread GitBox
cloud-fan commented on code in PR #36238: URL: https://github.com/apache/spark/pull/36238#discussion_r853768026 ## core/src/main/scala/org/apache/spark/executor/Executor.scala: ## @@ -264,16 +290,26 @@ private[spark] class Executor( decommissioned = true } + private[e

[GitHub] [spark] xwu99 commented on a diff in pull request #33941: [SPARK-36699][Core] Reuse compatible executors for stage-level scheduling

2022-04-19 Thread GitBox
xwu99 commented on code in PR #33941: URL: https://github.com/apache/spark/pull/33941#discussion_r853761272 ## core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala: ## @@ -518,11 +540,25 @@ private[spark] class ExecutorAllocationManager( numExecutorsTarget +

[GitHub] [spark] cloud-fan closed pull request #36276: [SPARK-38962][SQL] Fix wrong computeStats at DataSourceV2Relation

2022-04-19 Thread GitBox
cloud-fan closed pull request #36276: [SPARK-38962][SQL] Fix wrong computeStats at DataSourceV2Relation URL: https://github.com/apache/spark/pull/36276 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[GitHub] [spark] cloud-fan commented on pull request #36276: [SPARK-38962][SQL] Fix wrong computeStats at DataSourceV2Relation

2022-04-19 Thread GitBox
cloud-fan commented on PR #36276: URL: https://github.com/apache/spark/pull/36276#issuecomment-1103506965 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [spark] cloud-fan commented on a diff in pull request #36117: [SPARK-38832][SQL] Remove unnecessary distinct in aggregate expression by distinctKeys

2022-04-19 Thread GitBox
cloud-fan commented on code in PR #36117: URL: https://github.com/apache/spark/pull/36117#discussion_r853755920 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlanDistinctKeys.scala: ## @@ -29,6 +29,12 @@ import org.apache.spark.sql.internal.S

[GitHub] [spark] AmplabJenkins commented on pull request #36252: [SPARK-38939][SQL] Support DROP COLUMN [IF EXISTS] syntax

2022-04-19 Thread GitBox
AmplabJenkins commented on PR #36252: URL: https://github.com/apache/spark/pull/36252#issuecomment-1103504970 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] MaxGekk commented on a diff in pull request #36259: [SPARK-38949][SQL] Wrap SQL statements by double quotes in error messages

2022-04-19 Thread GitBox
MaxGekk commented on code in PR #36259: URL: https://github.com/apache/spark/pull/36259#discussion_r853747055 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala: ## @@ -416,13 +437,20 @@ object QueryParsingErrors extends QueryErrorsBase { }

[GitHub] [spark] ivoson commented on a diff in pull request #36259: [SPARK-38949][SQL] Wrap SQL statements by double quotes in error messages

2022-04-19 Thread GitBox
ivoson commented on code in PR #36259: URL: https://github.com/apache/spark/pull/36259#discussion_r853727398 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala: ## @@ -416,13 +437,20 @@ object QueryParsingErrors extends QueryErrorsBase { }

[GitHub] [spark] beliefer opened a new pull request, #36277: [SPARK-38219][SPARK-37691][branch-3.3] Support ANSI Aggregation Function: percentile_cont and percentile_disc

2022-04-19 Thread GitBox
beliefer opened a new pull request, #36277: URL: https://github.com/apache/spark/pull/36277 ### What changes were proposed in this pull request? This PR backport https://github.com/apache/spark/pull/35531 and https://github.com/apache/spark/pull/35041 to branch-3.3 ### Why are

[GitHub] [spark] cloud-fan commented on a diff in pull request #32298: [SPARK-34079][SQL] Merge non-correlated scalar subqueries

2022-04-19 Thread GitBox
cloud-fan commented on code in PR #32298: URL: https://github.com/apache/spark/pull/32298#discussion_r853713177 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala: ## @@ -663,11 +663,13 @@ case class UnresolvedWith( *

[GitHub] [spark] cloud-fan commented on a diff in pull request #32298: [SPARK-34079][SQL] Merge non-correlated scalar subqueries

2022-04-19 Thread GitBox
cloud-fan commented on code in PR #32298: URL: https://github.com/apache/spark/pull/32298#discussion_r853713177 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala: ## @@ -663,11 +663,13 @@ case class UnresolvedWith( *

[GitHub] [spark] cloud-fan commented on a diff in pull request #32298: [SPARK-34079][SQL] Merge non-correlated scalar subqueries

2022-04-19 Thread GitBox
cloud-fan commented on code in PR #32298: URL: https://github.com/apache/spark/pull/32298#discussion_r853712433 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/MergeScalarSubqueries.scala: ## @@ -0,0 +1,382 @@ +/* + * Licensed to the Apache Software Founda

[GitHub] [spark] ulysses-you commented on a diff in pull request #36276: [SPARK-38962][SQL] Fix wrong computeStats at DataSourceV2Relation

2022-04-19 Thread GitBox
ulysses-you commented on code in PR #36276: URL: https://github.com/apache/spark/pull/36276#discussion_r853710938 ## sql/catalyst/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala: ## @@ -80,7 +80,7 @@ case class DataSourceV2Relation(

[GitHub] [spark] ulysses-you commented on pull request #36276: [SPARK-38962][SQL] Fix wrong computeStats at DataSourceV2Relation

2022-04-19 Thread GitBox
ulysses-you commented on PR #36276: URL: https://github.com/apache/spark/pull/36276#issuecomment-1103451109 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

[GitHub] [spark] ulysses-you opened a new pull request, #36276: [SPARK-38962][SQL] Fix wrong computeStats at DataSourceV2Relation

2022-04-19 Thread GitBox
ulysses-you opened a new pull request, #36276: URL: https://github.com/apache/spark/pull/36276 ### What changes were proposed in this pull request? Use `Scan` to match `SupportsReportStatistics`. ### Why are the changes needed? The interface `SupportsReportStatist

[GitHub] [spark] ulysses-you commented on a diff in pull request #36253: [SPARK-38932][SQL] Datasource v2 support report unique keys

2022-04-19 Thread GitBox
ulysses-you commented on code in PR #36253: URL: https://github.com/apache/spark/pull/36253#discussion_r853708410 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/SupportsReportUniqueKeys.java: ## @@ -0,0 +1,40 @@ +/* + * Licensed to the Apache Software Foundati

[GitHub] [spark] ulysses-you commented on a diff in pull request #36117: [SPARK-38832][SQL] Remove unnecessary distinct in aggregate expression by distinctKeys

2022-04-19 Thread GitBox
ulysses-you commented on code in PR #36117: URL: https://github.com/apache/spark/pull/36117#discussion_r853706929 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlanDistinctKeys.scala: ## @@ -29,6 +29,12 @@ import org.apache.spark.sql.internal

[GitHub] [spark] gengliangwang commented on a diff in pull request #36274: [SPARK-38550][DOCS][FOLLOWUP] Improve the documentation of Diagnostic disk store

2022-04-19 Thread GitBox
gengliangwang commented on code in PR #36274: URL: https://github.com/apache/spark/pull/36274#discussion_r853703104 ## core/src/main/scala/org/apache/spark/internal/config/Status.scala: ## @@ -73,8 +73,8 @@ private[spark] object Status { val DISK_STORE_DIR_FOR_STATUS =

[GitHub] [spark] weixiuli commented on pull request #36162: [SPARK-32170][CORE] Improve the speculation through the stage task metrics.

2022-04-19 Thread GitBox
weixiuli commented on PR #36162: URL: https://github.com/apache/spark/pull/36162#issuecomment-1103442022 cc @Ngone51 @mridulm Could you please help take a look when you have time? Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] anchovYu commented on pull request #36241: [SPARK-38929][SQL] Improve error messages for cast failures in ANSI

2022-04-19 Thread GitBox
anchovYu commented on PR #36241: URL: https://github.com/apache/spark/pull/36241#issuecomment-1103441494 The cherrypick PR to 3.3: https://github.com/apache/spark/pull/36275 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] anchovYu commented on pull request #36275: [SPARK-38929][SQL][3.3] Improve error messages for cast failures in ANSI

2022-04-19 Thread GitBox
anchovYu commented on PR #36275: URL: https://github.com/apache/spark/pull/36275#issuecomment-1103440641 Hi @MaxGekk , this is the cherry-picked PR. Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

[GitHub] [spark] xwu99 commented on a diff in pull request #33941: [SPARK-36699][Core] Reuse compatible executors for stage-level scheduling

2022-04-19 Thread GitBox
xwu99 commented on code in PR #33941: URL: https://github.com/apache/spark/pull/33941#discussion_r853701291 ## core/src/main/scala/org/apache/spark/resource/ResourceProfileManager.scala: ## @@ -77,13 +85,27 @@ private[spark] class ResourceProfileManager(sparkConf: SparkConf,

[GitHub] [spark] linhongliu-db commented on a diff in pull request #36274: [SPARK-38550][DOCS][FOLLOWUP] Improve the documentation of Diagnostic disk store

2022-04-19 Thread GitBox
linhongliu-db commented on code in PR #36274: URL: https://github.com/apache/spark/pull/36274#discussion_r853701285 ## core/src/main/scala/org/apache/spark/internal/config/Status.scala: ## @@ -73,8 +73,8 @@ private[spark] object Status { val DISK_STORE_DIR_FOR_STATUS =

[GitHub] [spark] anchovYu opened a new pull request, #36275: [SPARK-38929][SQL][3.3] Improve error messages for cast failures in ANSI

2022-04-19 Thread GitBox
anchovYu opened a new pull request, #36275: URL: https://github.com/apache/spark/pull/36275 Backport to 3.3: Closes #36241 from anchovYu/ansi-error-improve. Authored-by: Xinyi Yu Signed-off-by: Max Gekk (cherry picked from commit f76b3e766f79b4c2d4f1ecffaad25aeb962336b7)

[GitHub] [spark] beliefer commented on pull request #36258: [SPARK-37613][SQL][FOLLOWUP] Supplement docs for regr_count

2022-04-19 Thread GitBox
beliefer commented on PR #36258: URL: https://github.com/apache/spark/pull/36258#issuecomment-1103439332 @HyukjinKwon @cloud-fan Thank you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [spark] gengliangwang commented on a diff in pull request #36274: [SPARK-38550][DOCS][FOLLOWUP] Improve the documentation of Diagnostic disk store

2022-04-19 Thread GitBox
gengliangwang commented on code in PR #36274: URL: https://github.com/apache/spark/pull/36274#discussion_r853699656 ## core/src/main/scala/org/apache/spark/internal/config/Status.scala: ## @@ -73,8 +73,8 @@ private[spark] object Status { val DISK_STORE_DIR_FOR_STATUS =

[GitHub] [spark] gengliangwang commented on pull request #36274: [SPARK-38550][DOCS][FOLLOWUP] Improve the documentation of Diagnostic disk store

2022-04-19 Thread GitBox
gengliangwang commented on PR #36274: URL: https://github.com/apache/spark/pull/36274#issuecomment-1103437742 cc @LinhongLiu @cloud-fan @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[GitHub] [spark] gengliangwang opened a new pull request, #36274: [SPARK-38550][DOCS][FOLLOWUP] Improve the documentation of Diagnostic disk store

2022-04-19 Thread GitBox
gengliangwang opened a new pull request, #36274: URL: https://github.com/apache/spark/pull/36274 ### What changes were proposed in this pull request? * Add the conf `spark.appStatusStore.diskStoreDir` in configuration.md * This diagnostic API requires setting `spark.appStatu

[GitHub] [spark] xwu99 commented on a diff in pull request #33941: [SPARK-36699][Core] Reuse compatible executors for stage-level scheduling

2022-04-19 Thread GitBox
xwu99 commented on code in PR #33941: URL: https://github.com/apache/spark/pull/33941#discussion_r853695878 ## core/src/main/scala/org/apache/spark/resource/ResourceProfileManager.scala: ## @@ -77,13 +85,27 @@ private[spark] class ResourceProfileManager(sparkConf: SparkConf,

[GitHub] [spark] xwu99 commented on a diff in pull request #33941: [SPARK-36699][Core] Reuse compatible executors for stage-level scheduling

2022-04-19 Thread GitBox
xwu99 commented on code in PR #33941: URL: https://github.com/apache/spark/pull/33941#discussion_r853695878 ## core/src/main/scala/org/apache/spark/resource/ResourceProfileManager.scala: ## @@ -77,13 +85,27 @@ private[spark] class ResourceProfileManager(sparkConf: SparkConf,

[GitHub] [spark] cloud-fan commented on a diff in pull request #32298: [SPARK-34079][SQL] Merge non-correlated scalar subqueries

2022-04-19 Thread GitBox
cloud-fan commented on code in PR #32298: URL: https://github.com/apache/spark/pull/32298#discussion_r853685821 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/MergeScalarSubqueries.scala: ## @@ -0,0 +1,382 @@ +/* + * Licensed to the Apache Software Founda

[GitHub] [spark] zhengruifeng commented on pull request #36181: Implement `skipna`s of statistical functions of Series and DataFrame

2022-04-19 Thread GitBox
zhengruifeng commented on PR #36181: URL: https://github.com/apache/spark/pull/36181#issuecomment-1103419993 @xinrong-databricks I think we should keep in line with the pandas on dealing with NAs, and end users do not need to know the internal details about converting NAs to nulls.

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36245: [SPARK-38936][SQL] Script transform feed thread should have name

2022-04-19 Thread GitBox
HyukjinKwon commented on code in PR #36245: URL: https://github.com/apache/spark/pull/36245#discussion_r853683619 ## sql/core/src/main/scala/org/apache/spark/sql/execution/BaseScriptTransformationExec.scala: ## @@ -262,7 +262,8 @@ trait BaseScriptTransformationExec extends Unary

[GitHub] [spark] HyukjinKwon commented on pull request #36245: [SPARK-38936][SQL] Script transform feed thread should have name

2022-04-19 Thread GitBox
HyukjinKwon commented on PR #36245: URL: https://github.com/apache/spark/pull/36245#issuecomment-1103417894 cc @AngersZh FYI . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36253: [SPARK-38932][SQL] Datasource v2 support report unique keys

2022-04-19 Thread GitBox
HyukjinKwon commented on code in PR #36253: URL: https://github.com/apache/spark/pull/36253#discussion_r853683206 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/SupportsReportUniqueKeys.java: ## @@ -0,0 +1,40 @@ +/* + * Licensed to the Apache Software Foundati

[GitHub] [spark] cloud-fan closed pull request #36268: [SPARK-37575][SQL][FOLLOWUP] Update the migration guide for added legacy flag for the breaking change of write null value in csv to unquoted empt

2022-04-19 Thread GitBox
cloud-fan closed pull request #36268: [SPARK-37575][SQL][FOLLOWUP] Update the migration guide for added legacy flag for the breaking change of write null value in csv to unquoted empty string URL: https://github.com/apache/spark/pull/36268 -- This is an automated message from the Apache Git

[GitHub] [spark] cloud-fan commented on pull request #36268: [SPARK-37575][SQL][FOLLOWUP] Update the migration guide for added legacy flag for the breaking change of write null value in csv to unquote

2022-04-19 Thread GitBox
cloud-fan commented on PR #36268: URL: https://github.com/apache/spark/pull/36268#issuecomment-1103415783 thanks, merging to master/3.3! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[GitHub] [spark] cloud-fan commented on a diff in pull request #36117: [SPARK-38832][SQL] Remove unnecessary distinct in aggregate expression by distinctKeys

2022-04-19 Thread GitBox
cloud-fan commented on code in PR #36117: URL: https://github.com/apache/spark/pull/36117#discussion_r853680161 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlanDistinctKeys.scala: ## @@ -29,6 +29,12 @@ import org.apache.spark.sql.internal.S

[GitHub] [spark] LuciferYang commented on a diff in pull request #36261: [SPARK-38948][TESTS] Fix `DiskRowQueue` leak in `PythonForeachWriterSuite`

2022-04-19 Thread GitBox
LuciferYang commented on code in PR #36261: URL: https://github.com/apache/spark/pull/36261#discussion_r853680010 ## sql/core/src/test/scala/org/apache/spark/sql/execution/python/PythonForeachWriterSuite.scala: ## @@ -102,11 +102,15 @@ class PythonForeachWriterSuite extends Spar

[GitHub] [spark] dongjoon-hyun commented on pull request #36271: [SPARK-38844][PYTHON][TESTS][FOLLOW-UP] Test pyspark.pandas.tests.test_generic_functions

2022-04-19 Thread GitBox
dongjoon-hyun commented on PR #36271: URL: https://github.com/apache/spark/pull/36271#issuecomment-1103391957 +1, LGTM. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

[GitHub] [spark] zhengruifeng commented on a diff in pull request #36246: [SPARK-38937][PYTHON] interpolate support param `limit_direction`

2022-04-19 Thread GitBox
zhengruifeng commented on code in PR #36246: URL: https://github.com/apache/spark/pull/36246#discussion_r853669412 ## python/pyspark/pandas/series.py: ## @@ -2209,15 +2219,43 @@ def _interpolate( ) * null_index_forward + last_non_null_forward fill_cond = ~F.i

[GitHub] [spark] zhengruifeng commented on a diff in pull request #36246: [SPARK-38937][PYTHON] interpolate support param `limit_direction`

2022-04-19 Thread GitBox
zhengruifeng commented on code in PR #36246: URL: https://github.com/apache/spark/pull/36246#discussion_r853668488 ## python/pyspark/pandas/generic.py: ## @@ -3259,6 +3260,10 @@ def interpolate( Maximum number of consecutive NaNs to fill. Must be greater than

[GitHub] [spark] zhengruifeng commented on a diff in pull request #36246: [SPARK-38937][PYTHON] interpolate support param `limit_direction`

2022-04-19 Thread GitBox
zhengruifeng commented on code in PR #36246: URL: https://github.com/apache/spark/pull/36246#discussion_r853668488 ## python/pyspark/pandas/generic.py: ## @@ -3259,6 +3260,10 @@ def interpolate( Maximum number of consecutive NaNs to fill. Must be greater than

[GitHub] [spark] panbingkun opened a new pull request, #36273: [SPARK-38960][Core]Spark should fail fast if initial memory too large…

2022-04-19 Thread GitBox
panbingkun opened a new pull request, #36273: URL: https://github.com/apache/spark/pull/36273 ### What changes were proposed in this pull request? Added an exception to be thrown in SparkConf.validateSettings if set initial memory(set by "spark.executor.extraJavaOptions=-Xms{XXX}G" )

[GitHub] [spark] HyukjinKwon closed pull request #36271: [SPARK-38844][PYTHON][TESTS][FOLLOW-UP] Test pyspark.pandas.tests.test_generic_functions

2022-04-19 Thread GitBox
HyukjinKwon closed pull request #36271: [SPARK-38844][PYTHON][TESTS][FOLLOW-UP] Test pyspark.pandas.tests.test_generic_functions URL: https://github.com/apache/spark/pull/36271 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] HyukjinKwon commented on pull request #36271: [SPARK-38844][PYTHON][TESTS][FOLLOW-UP] Test pyspark.pandas.tests.test_generic_functions

2022-04-19 Thread GitBox
HyukjinKwon commented on PR #36271: URL: https://github.com/apache/spark/pull/36271#issuecomment-1103375342 Thanks all! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

[GitHub] [spark] HyukjinKwon commented on pull request #36271: [SPARK-38844][PYTHON][TESTS][FOLLOW-UP] Test pyspark.pandas.tests.test_generic_functions

2022-04-19 Thread GitBox
HyukjinKwon commented on PR #36271: URL: https://github.com/apache/spark/pull/36271#issuecomment-1103375136 All pyspark tests passed. Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

[GitHub] [spark] HyukjinKwon closed pull request #36270: [SPARK-38956][TESTS] Fix FAILED_EXECUTE_UDF test case on Java 17

2022-04-19 Thread GitBox
HyukjinKwon closed pull request #36270: [SPARK-38956][TESTS] Fix FAILED_EXECUTE_UDF test case on Java 17 URL: https://github.com/apache/spark/pull/36270 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] HyukjinKwon commented on pull request #36270: [SPARK-38956][TESTS] Fix FAILED_EXECUTE_UDF test case on Java 17

2022-04-19 Thread GitBox
HyukjinKwon commented on PR #36270: URL: https://github.com/apache/spark/pull/36270#issuecomment-1103372759 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HyukjinKwon closed pull request #36258: [SPARK-37613][SQL][FOLLOWUP] Supplement docs for regr_count

2022-04-19 Thread GitBox
HyukjinKwon closed pull request #36258: [SPARK-37613][SQL][FOLLOWUP] Supplement docs for regr_count URL: https://github.com/apache/spark/pull/36258 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[GitHub] [spark] HyukjinKwon commented on pull request #36258: [SPARK-37613][SQL][FOLLOWUP] Supplement docs for regr_count

2022-04-19 Thread GitBox
HyukjinKwon commented on PR #36258: URL: https://github.com/apache/spark/pull/36258#issuecomment-1103372145 Merged to master and branch-3.3. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [spark] ulysses-you commented on a diff in pull request #36117: [SPARK-38832][SQL] Remove unnecessary distinct in aggregate expression by distinctKeys

2022-04-19 Thread GitBox
ulysses-you commented on code in PR #36117: URL: https://github.com/apache/spark/pull/36117#discussion_r853662688 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlanDistinctKeys.scala: ## @@ -29,6 +29,12 @@ import org.apache.spark.sql.internal

[GitHub] [spark] zhengruifeng commented on pull request #36257: [SPARK-38943][PYTHON] EWM support ignore_na

2022-04-19 Thread GitBox
zhengruifeng commented on PR #36257: URL: https://github.com/apache/spark/pull/36257#issuecomment-1103367501 Thanks @HyukjinKwon for reviewing! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] ulysses-you commented on a diff in pull request #36117: [SPARK-38832][SQL] Remove unnecessary distinct in aggregate expression by distinctKeys

2022-04-19 Thread GitBox
ulysses-you commented on code in PR #36117: URL: https://github.com/apache/spark/pull/36117#discussion_r853662688 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlanDistinctKeys.scala: ## @@ -29,6 +29,12 @@ import org.apache.spark.sql.internal

[GitHub] [spark] HyukjinKwon closed pull request #36255: [SPARK-38828][PYTHON] Remove TimestampNTZ type Python support in Spark 3.3

2022-04-19 Thread GitBox
HyukjinKwon closed pull request #36255: [SPARK-38828][PYTHON] Remove TimestampNTZ type Python support in Spark 3.3 URL: https://github.com/apache/spark/pull/36255 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

[GitHub] [spark] HyukjinKwon commented on pull request #36255: [SPARK-38828][PYTHON] Remove TimestampNTZ type Python support in Spark 3.3

2022-04-19 Thread GitBox
HyukjinKwon commented on PR #36255: URL: https://github.com/apache/spark/pull/36255#issuecomment-1103357946 Merged to master and branch-3.3. cc @MaxGekk @gengliangwang Python side is ready for hiding timestamp ntz. -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] HyukjinKwon closed pull request #36257: [SPARK-38943][PYTHON] EWM support ignore_na

2022-04-19 Thread GitBox
HyukjinKwon closed pull request #36257: [SPARK-38943][PYTHON] EWM support ignore_na URL: https://github.com/apache/spark/pull/36257 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[GitHub] [spark] HyukjinKwon commented on pull request #36257: [SPARK-38943][PYTHON] EWM support ignore_na

2022-04-19 Thread GitBox
HyukjinKwon commented on PR #36257: URL: https://github.com/apache/spark/pull/36257#issuecomment-1103355723 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36261: [SPARK-38948][TESTS] Fix `DiskRowQueue` leak in `PythonForeachWriterSuite`

2022-04-19 Thread GitBox
HyukjinKwon commented on code in PR #36261: URL: https://github.com/apache/spark/pull/36261#discussion_r853658325 ## sql/core/src/test/scala/org/apache/spark/sql/execution/python/PythonForeachWriterSuite.scala: ## @@ -102,11 +102,15 @@ class PythonForeachWriterSuite extends Spar

[GitHub] [spark] zhengruifeng commented on pull request #36246: [SPARK-38937][PYTHON] interpolate support param `limit_direction`

2022-04-19 Thread GitBox
zhengruifeng commented on PR #36246: URL: https://github.com/apache/spark/pull/36246#issuecomment-1103354556 @itholic Sure! after https://github.com/apache/spark/pull/36271#pullrequestreview-946491209 get merged, I will rebase this PR to trigger the build. Thanks! -- This is an automate

[GitHub] [spark] itholic commented on a diff in pull request #36215: [SPARK-38938][PYTHON] Implement `inplace` and `columns` parameters of `Series.drop`

2022-04-19 Thread GitBox
itholic commented on code in PR #36215: URL: https://github.com/apache/spark/pull/36215#discussion_r853657776 ## python/pyspark/pandas/tests/test_series.py: ## @@ -1749,14 +1759,22 @@ def test_drop(self): with self.assertRaisesRegex(ValueError, msg): psser.

[GitHub] [spark] morvenhuang commented on pull request #36212: [SPARK-38914][SQL] Allow user to insert specified columns into insertable view

2022-04-19 Thread GitBox
morvenhuang commented on PR #36212: URL: https://github.com/apache/spark/pull/36212#issuecomment-1103345856 @dtenedor Thank you for the patience and the help. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

[GitHub] [spark] beliefer commented on pull request #35041: [SPARK-37691][SQL] Support ANSI Aggregation Function: `percentile_disc`

2022-04-19 Thread GitBox
beliefer commented on PR #35041: URL: https://github.com/apache/spark/pull/35041#issuecomment-1103345609 @cloud-fan Thank you very much! @MaxGekk @jiangxb1987 Thank you too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] itholic commented on a diff in pull request #36246: [SPARK-38937][PYTHON] interpolate support param `limit_direction`

2022-04-19 Thread GitBox
itholic commented on code in PR #36246: URL: https://github.com/apache/spark/pull/36246#discussion_r853654254 ## python/pyspark/pandas/generic.py: ## @@ -3259,6 +3260,10 @@ def interpolate( Maximum number of consecutive NaNs to fill. Must be greater than

[GitHub] [spark] itholic commented on pull request #36246: [SPARK-38937][PYTHON] interpolate support param `limit_direction`

2022-04-19 Thread GitBox
itholic commented on PR #36246: URL: https://github.com/apache/spark/pull/36246#issuecomment-1103341881 Seems like the CI builder is fixed now. Could you rebase to master ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

[GitHub] [spark] allisonwang-db opened a new pull request, #36272: [SPARK-38957][SQL] Use multipartIdentifier for parsing table-valued functions

2022-04-19 Thread GitBox
allisonwang-db opened a new pull request, #36272: URL: https://github.com/apache/spark/pull/36272 ### What changes were proposed in this pull request? This PR uses multipart identifiers when parsing table-valued functions. ### Why are the changes needed? To make table-valued

[GitHub] [spark] itholic commented on pull request #36083: [SPARK-38581][PYTHON][DOCS] List of supported pandas APIs for pandas-on-Spark docs.

2022-04-19 Thread GitBox
itholic commented on PR #36083: URL: https://github.com/apache/spark/pull/36083#issuecomment-1103322576 Thanks! Let me create the ticket right after completing this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

[GitHub] [spark] zhengruifeng commented on pull request #36271: [SPARK-38844][PYTHON][TESTS][FOLLOW-UP] Test pyspark.pandas.tests.test_generic_functions

2022-04-19 Thread GitBox
zhengruifeng commented on PR #36271: URL: https://github.com/apache/spark/pull/36271#issuecomment-1103321116 great catch! thanks @HyukjinKwon ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] Yikun commented on pull request #36083: [SPARK-38581][PYTHON][DOCS] List of supported pandas APIs for pandas-on-Spark docs.

2022-04-19 Thread GitBox
Yikun commented on PR #36083: URL: https://github.com/apache/spark/pull/36083#issuecomment-1103318782 @itholic Fine for me, and actually I wasn't mean use above scripts to generate doc completely (inline code), because we had some specific note based on doc. You could just regard it

[GitHub] [spark] itholic commented on pull request #36083: [SPARK-38581][PYTHON][DOCS] List of supported pandas APIs for pandas-on-Spark docs.

2022-04-19 Thread GitBox
itholic commented on PR #36083: URL: https://github.com/apache/spark/pull/36083#issuecomment-1103309286 @Yikun Yeah, I tried it at first, but it was a bit tricky since the rules for the RST format that Sphinx checks are way stricter than I think when building a document, so I decided to man

[GitHub] [spark] github-actions[bot] closed pull request #34062: [SPARK-36819][SQL] Don't insert redundant filters in case static partition pruning can be done

2022-04-19 Thread GitBox
github-actions[bot] closed pull request #34062: [SPARK-36819][SQL] Don't insert redundant filters in case static partition pruning can be done URL: https://github.com/apache/spark/pull/34062 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] github-actions[bot] commented on pull request #33446: [SPARK-36215][SHUFFLE] Add logging for slow fetches to diagnose external shuffle service issues

2022-04-19 Thread GitBox
github-actions[bot] commented on PR #33446: URL: https://github.com/apache/spark/pull/33446#issuecomment-1103285936 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] commented on pull request #35083: [WIP][SPARK-37798] PySpark Pandas API: Cross and conditional merging

2022-04-19 Thread GitBox
github-actions[bot] commented on PR #35083: URL: https://github.com/apache/spark/pull/35083#issuecomment-1103285907 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] HyukjinKwon commented on pull request #36271: [SPARK-38844][PYTHON][TESTS][FOLLOW-UP] Test pyspark.pandas.tests.test_generic_functions

2022-04-19 Thread GitBox
HyukjinKwon commented on PR #36271: URL: https://github.com/apache/spark/pull/36271#issuecomment-1103264269 Build link: https://github.com/HyukjinKwon/spark/runs/6086874477 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36127: [SPARK-38844][PYTHON][SQL] Implement linear interpolate

2022-04-19 Thread GitBox
HyukjinKwon commented on code in PR #36127: URL: https://github.com/apache/spark/pull/36127#discussion_r853600923 ## python/pyspark/pandas/tests/test_generic_functions.py: ## @@ -0,0 +1,124 @@ +# Review Comment: https://github.com/apache/spark/pull/36271 -- This is an au

[GitHub] [spark] HyukjinKwon opened a new pull request, #36271: [SPARK-38844][PYTHON][TESTS][FOLLOW-UP] Test pyspark.pandas.tests.test_generic_functions

2022-04-19 Thread GitBox
HyukjinKwon opened a new pull request, #36271: URL: https://github.com/apache/spark/pull/36271 ### What changes were proposed in this pull request? This is a minor followup of https://github.com/apache/spark/pull/36127 that actually activates the tests added. ### Why are the ch

[GitHub] [spark] huaxingao commented on pull request #36264: [SPARK-38950][SQL] Return Array of Predicate for SupportsPushDownCatalystFilters.pushedFilters

2022-04-19 Thread GitBox
huaxingao commented on PR #36264: URL: https://github.com/apache/spark/pull/36264#issuecomment-1103262457 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] huaxingao commented on a diff in pull request #36264: [SPARK-38950][SQL] Return Array of Predicate for SupportsPushDownCatalystFilters.pushedFilters

2022-04-19 Thread GitBox
huaxingao commented on code in PR #36264: URL: https://github.com/apache/spark/pull/36264#discussion_r853597550 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/SupportsPushDownCatalystFilters.scala: ## @@ -35,7 +35,7 @@ trait SupportsPushDownCatalystFilter

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36127: [SPARK-38844][PYTHON][SQL] Implement linear interpolate

2022-04-19 Thread GitBox
HyukjinKwon commented on code in PR #36127: URL: https://github.com/apache/spark/pull/36127#discussion_r853597396 ## python/pyspark/pandas/tests/test_generic_functions.py: ## @@ -0,0 +1,124 @@ +# Review Comment: Oh, actually we should add this file into https://github.com/a

[GitHub] [spark] dongjoon-hyun commented on pull request #36270: [SPARK-38956][TESTS] Fix FAILED_EXECUTE_UDF test case on Java 17

2022-04-19 Thread GitBox
dongjoon-hyun commented on PR #36270: URL: https://github.com/apache/spark/pull/36270#issuecomment-1103253160 cc @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36220: [SPARK-38727][SQL][TESTS] Test the error class: FAILED_EXECUTE_UDF

2022-04-19 Thread GitBox
dongjoon-hyun commented on code in PR #36220: URL: https://github.com/apache/spark/pull/36220#discussion_r853570987 ## sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala: ## @@ -418,4 +418,20 @@ class QueryExecutionErrorsSuite extends QueryTest

[GitHub] [spark] williamhyun opened a new pull request, #36270: [SPARK-38956][TESTS] Fix FAILED_EXECUTE_UDF test case on Java 17

2022-04-19 Thread GitBox
williamhyun opened a new pull request, #36270: URL: https://github.com/apache/spark/pull/36270 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] xinrong-databricks commented on a diff in pull request #36269: Test anchor frame for in-place `Series.rename_axis`

2022-04-19 Thread GitBox
xinrong-databricks commented on code in PR #36269: URL: https://github.com/apache/spark/pull/36269#discussion_r853557693 ## python/pyspark/pandas/tests/test_series.py: ## @@ -260,6 +255,12 @@ def test_rename_axis(self): psser.rename_axis(index=str.upper).sort_index(

[GitHub] [spark] AmplabJenkins commented on pull request #36260: [SPARK-38945][K8S] Simply KEYTAB and PRINCIPAL in KerberosConfDriverFeatureStep

2022-04-19 Thread GitBox
AmplabJenkins commented on PR #36260: URL: https://github.com/apache/spark/pull/36260#issuecomment-1103244085 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] xinrong-databricks commented on a diff in pull request #36269: Test anchor frame for in-place `Series.rename_axis`

2022-04-19 Thread GitBox
xinrong-databricks commented on code in PR #36269: URL: https://github.com/apache/spark/pull/36269#discussion_r853557693 ## python/pyspark/pandas/tests/test_series.py: ## @@ -260,6 +255,12 @@ def test_rename_axis(self): psser.rename_axis(index=str.upper).sort_index(

[GitHub] [spark] xinrong-databricks opened a new pull request, #36269: Test anchor frame for in-place `Series.rename_axis`

2022-04-19 Thread GitBox
xinrong-databricks opened a new pull request, #36269: URL: https://github.com/apache/spark/pull/36269 ### What changes were proposed in this pull request? Test anchor frame for in-place `Series.rename_axis`. ### Why are the changes needed? As a follow-up for https://github.com/ap

[GitHub] [spark] dtenedor commented on pull request #36212: [SPARK-38914][SQL] Allow user to insert specified columns into insertable view

2022-04-19 Thread GitBox
dtenedor commented on PR #36212: URL: https://github.com/apache/spark/pull/36212#issuecomment-1103230094 @morvenhuang Note, I made a small update to `ResolveDefaultColumns` to fix the two tests you mentioned in [1]. Thanks for pointing that out, it gives us a chance to improve that code.

[GitHub] [spark] maryannxue commented on a diff in pull request #36238: [SPARK-38916][CORE] Tasks not killed caused by race conditions between killTask() and launchTask()

2022-04-19 Thread GitBox
maryannxue commented on code in PR #36238: URL: https://github.com/apache/spark/pull/36238#discussion_r853536305 ## core/src/main/scala/org/apache/spark/executor/Executor.scala: ## @@ -264,16 +290,26 @@ private[spark] class Executor( decommissioned = true } + private[

[GitHub] [spark] anchovYu commented on a diff in pull request #36241: [SPARK-38929][SQL] Improve error messages for cast failures in ANSI

2022-04-19 Thread GitBox
anchovYu commented on code in PR #36241: URL: https://github.com/apache/spark/pull/36241#discussion_r853530179 ## sql/core/src/test/resources/sql-tests/results/string-functions.sql.out: ## @@ -1,5 +1,5 @@ -- Automatically generated by SQLQueryTestSuite --- Number of queries: 14

[GitHub] [spark] mridulm commented on pull request #35683: [SPARK-30835][CORE][YARN] Add support for YARN decommissioning & pre-emption

2022-04-19 Thread GitBox
mridulm commented on PR #35683: URL: https://github.com/apache/spark/pull/35683#issuecomment-1103207562 Let us add shuffle service enabled == false as well, until this is supported in this context. -- This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [spark] mridulm commented on pull request #36222: [SPARK-38922][Core] TaskLocation.apply throw NullPointerException

2022-04-19 Thread GitBox
mridulm commented on PR #36222: URL: https://github.com/apache/spark/pull/36222#issuecomment-1103205833 Thanks for clarifying, looks good to me. +CC @tgravescs - you might be interested in this behavior. -- This is an automated message from the Apache Git Service. To respond to the mess

[GitHub] [spark] bersprockets commented on a diff in pull request #36230: [SPARK-38868][SQL] Avoid `RaiseError` exceptions while optimizing outer joins

2022-04-19 Thread GitBox
bersprockets commented on code in PR #36230: URL: https://github.com/apache/spark/pull/36230#discussion_r853513127 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala: ## @@ -144,6 +144,8 @@ object EliminateOuterJoin extends Rule[LogicalPlan] with

[GitHub] [spark] anchovYu commented on pull request #36268: [SPARK-37575][SQL][FOLLOWUP] Update the migration guide for added legacy flag for the breaking change of write null value in csv to unquoted

2022-04-19 Thread GitBox
anchovYu commented on PR #36268: URL: https://github.com/apache/spark/pull/36268#issuecomment-1103202176 Hi @cloud-fan , this is the migration guide update follow-up. Could you review? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please l

  1   2   >