[GitHub] [spark] AmplabJenkins removed a comment on pull request #33174: [SPARK-35721][PYTHON] Path level discover for python unittests

2021-07-13 Thread GitBox
AmplabJenkins removed a comment on pull request #33174: URL: https://github.com/apache/spark/pull/33174#issuecomment-879557067 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45505/

[GitHub] [spark] AmplabJenkins commented on pull request #33174: [SPARK-35721][PYTHON] Path level discover for python unittests

2021-07-13 Thread GitBox
AmplabJenkins commented on pull request #33174: URL: https://github.com/apache/spark/pull/33174#issuecomment-879557067 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45505/ --

[GitHub] [spark] sarutak opened a new pull request #33333: [SPARK-36129][BUILD] Upgrade commons-compress to 1.21 to deal with CVEs

2021-07-13 Thread GitBox
sarutak opened a new pull request #3: URL: https://github.com/apache/spark/pull/3 ### What changes were proposed in this pull request? This PR upgrades `commons-compress` from `1.20` to `1.21` to deal with CVEs. ### Why are the changes needed? Some CVEs which

[GitHub] [spark] SparkQA commented on pull request #32401: [SPARK-35276][CORE] Calculate checksum for shuffle data and write as checksum file

2021-07-13 Thread GitBox
SparkQA commented on pull request #32401: URL: https://github.com/apache/spark/pull/32401#issuecomment-879555269 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45506/ -- This is an automated message from the Apache

[GitHub] [spark] cfmcgrady commented on pull request #32488: [SPARK-35316][SQL] UnwrapCastInBinaryComparison support In/InSet predicate

2021-07-13 Thread GitBox
cfmcgrady commented on pull request #32488: URL: https://github.com/apache/spark/pull/32488#issuecomment-879554154 > @allisonwang-db good catch! can you open a JIRA ticket to track this bug? Open a JIRA ticket

[GitHub] [spark] HyukjinKwon commented on pull request #30869: [SPARK-33865][SQL] When HiveDDL, we need check avro schema too

2021-07-13 Thread GitBox
HyukjinKwon commented on pull request #30869: URL: https://github.com/apache/spark/pull/30869#issuecomment-879551742 cc @xkrogen FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HeartSaVioR commented on a change in pull request #32401: [SPARK-35276][CORE] Calculate checksum for shuffle data and write as checksum file

2021-07-13 Thread GitBox
HeartSaVioR commented on a change in pull request #32401: URL: https://github.com/apache/spark/pull/32401#discussion_r669248537 ## File path: core/src/main/java/org/apache/spark/shuffle/checksum/ShuffleChecksumHelper.java ## @@ -0,0 +1,83 @@ +package

[GitHub] [spark] SparkQA removed a comment on pull request #33286: [SPARK-36079][SQL] Null-based filter estimate should always be in the range [0, 1]

2021-07-13 Thread GitBox
SparkQA removed a comment on pull request #33286: URL: https://github.com/apache/spark/pull/33286#issuecomment-879449174 **[Test build #140985 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140985/testReport)** for PR 33286 at commit

[GitHub] [spark] SparkQA commented on pull request #33258: [SPARK-36037][SQL] Support ANSI SQL LOCALTIMESTAMP datetime value function

2021-07-13 Thread GitBox
SparkQA commented on pull request #33258: URL: https://github.com/apache/spark/pull/33258#issuecomment-879551344 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45504/ -- This is an automated message from the Apache

[GitHub] [spark] SparkQA commented on pull request #33286: [SPARK-36079][SQL] Null-based filter estimate should always be in the range [0, 1]

2021-07-13 Thread GitBox
SparkQA commented on pull request #33286: URL: https://github.com/apache/spark/pull/33286#issuecomment-879551139 **[Test build #140985 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140985/testReport)** for PR 33286 at commit

[GitHub] [spark] SparkQA commented on pull request #33174: [SPARK-35721][PYTHON] Path level discover for python unittests

2021-07-13 Thread GitBox
SparkQA commented on pull request #33174: URL: https://github.com/apache/spark/pull/33174#issuecomment-879549740 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45505/ --

[GitHub] [spark] HyukjinKwon commented on a change in pull request #33309: [SPARK-36106][SQL][CORE] Label error classes for subset of QueryCompilationErrors

2021-07-13 Thread GitBox
HyukjinKwon commented on a change in pull request #33309: URL: https://github.com/apache/spark/pull/33309#discussion_r669246375 ## File path: core/src/main/resources/error/error-classes.json ## @@ -11,6 +11,25 @@ "message" : [ "Found duplicate keys '%s'" ],

[GitHub] [spark] HyukjinKwon commented on a change in pull request #33263: [SPARK-35027][CORE] Close the inputStream in FileAppender when writin…

2021-07-13 Thread GitBox
HyukjinKwon commented on a change in pull request #33263: URL: https://github.com/apache/spark/pull/33263#discussion_r669246103 ## File path: core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala ## @@ -185,11 +185,11 @@ private[deploy] class ExecutorRunner(

[GitHub] [spark] cloud-fan commented on pull request #32488: [SPARK-35316][SQL] UnwrapCastInBinaryComparison support In/InSet predicate

2021-07-13 Thread GitBox
cloud-fan commented on pull request #32488: URL: https://github.com/apache/spark/pull/32488#issuecomment-879547677 @allisonwang-db good catch! can you open a JIRA ticket to track this bug? -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] cloud-fan commented on a change in pull request #24595: [SPARK-20774][SPARK-27036][SQL] Cancel the running broadcast execution on BroadcastTimeout

2021-07-13 Thread GitBox
cloud-fan commented on a change in pull request #24595: URL: https://github.com/apache/spark/pull/24595#discussion_r669244466 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala ## @@ -67,68 +70,74 @@ case class

[GitHub] [spark] Ngone51 commented on pull request #33116: [SPARK-35259][SHUFFLE] Rename ExternalBlockHandler Timer variables to remove incorrect millis suffix

2021-07-13 Thread GitBox
Ngone51 commented on pull request #33116: URL: https://github.com/apache/spark/pull/33116#issuecomment-87950 > My only concern with this approach is if some other metrics reporter (besides YarnShuffleService) may try to use these custom timers as if they still had nanosecond units.

[GitHub] [spark] HyukjinKwon commented on pull request #33314: [SPARK-36118][SQL] Add bitmap functions for Spark SQL

2021-07-13 Thread GitBox
HyukjinKwon commented on pull request #33314: URL: https://github.com/apache/spark/pull/33314#issuecomment-879544141 are there more references? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] Shockang commented on a change in pull request #24595: [SPARK-20774][SPARK-27036][SQL] Cancel the running broadcast execution on BroadcastTimeout

2021-07-13 Thread GitBox
Shockang commented on a change in pull request #24595: URL: https://github.com/apache/spark/pull/24595#discussion_r669236306 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala ## @@ -67,68 +70,74 @@ case class

[GitHub] [spark] Shockang edited a comment on pull request #24595: [SPARK-20774][SPARK-27036][SQL] Cancel the running broadcast execution on BroadcastTimeout

2021-07-13 Thread GitBox
Shockang edited a comment on pull request #24595: URL: https://github.com/apache/spark/pull/24595#issuecomment-879541147 > @Shockang how do you think about this proposal? https://github.com/apache/spark/pull/24595/files#r667590820 Sorry, I'm busy in the past several days. You can

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32401: [SPARK-35276][CORE] Calculate checksum for shuffle data and write as checksum file

2021-07-13 Thread GitBox
AmplabJenkins removed a comment on pull request #32401: URL: https://github.com/apache/spark/pull/32401#issuecomment-879540980 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140992/

[GitHub] [spark] SparkQA removed a comment on pull request #32401: [SPARK-35276][CORE] Calculate checksum for shuffle data and write as checksum file

2021-07-13 Thread GitBox
SparkQA removed a comment on pull request #32401: URL: https://github.com/apache/spark/pull/32401#issuecomment-879540799 **[Test build #140992 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140992/testReport)** for PR 32401 at commit

[GitHub] [spark] Shockang commented on pull request #24595: [SPARK-20774][SPARK-27036][SQL] Cancel the running broadcast execution on BroadcastTimeout

2021-07-13 Thread GitBox
Shockang commented on pull request #24595: URL: https://github.com/apache/spark/pull/24595#issuecomment-879541147 > @Shockang how do you think about this proposal? https://github.com/apache/spark/pull/24595/files#r667590820 Sorry, I'm busy these two days. You can take a look at my

[GitHub] [spark] SparkQA commented on pull request #32401: [SPARK-35276][CORE] Calculate checksum for shuffle data and write as checksum file

2021-07-13 Thread GitBox
SparkQA commented on pull request #32401: URL: https://github.com/apache/spark/pull/32401#issuecomment-879540969 **[Test build #140992 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140992/testReport)** for PR 32401 at commit

[GitHub] [spark] AmplabJenkins commented on pull request #32401: [SPARK-35276][CORE] Calculate checksum for shuffle data and write as checksum file

2021-07-13 Thread GitBox
AmplabJenkins commented on pull request #32401: URL: https://github.com/apache/spark/pull/32401#issuecomment-879540980 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140992/ -- This

[GitHub] [spark] Shockang commented on a change in pull request #24595: [SPARK-20774][SPARK-27036][SQL] Cancel the running broadcast execution on BroadcastTimeout

2021-07-13 Thread GitBox
Shockang commented on a change in pull request #24595: URL: https://github.com/apache/spark/pull/24595#discussion_r669236306 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala ## @@ -67,68 +70,74 @@ case class

[GitHub] [spark] SparkQA commented on pull request #32401: [SPARK-35276][CORE] Calculate checksum for shuffle data and write as checksum file

2021-07-13 Thread GitBox
SparkQA commented on pull request #32401: URL: https://github.com/apache/spark/pull/32401#issuecomment-879540799 **[Test build #140992 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140992/testReport)** for PR 32401 at commit

[GitHub] [spark] AmplabJenkins commented on pull request #33332: [SQL] Warn if less files visible after stats write

2021-07-13 Thread GitBox
AmplabJenkins commented on pull request #2: URL: https://github.com/apache/spark/pull/2#issuecomment-879540304 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] ulysses-you commented on a change in pull request #32872: [SPARK-35639][SQL] Make hasCoalescedPartition return true if something was actually coalesced

2021-07-13 Thread GitBox
ulysses-you commented on a change in pull request #32872: URL: https://github.com/apache/spark/pull/32872#discussion_r669237731 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CustomShuffleReaderExec.scala ## @@ -87,8 +87,15 @@ case class

[GitHub] [spark] tooptoop4 opened a new pull request #33332: [SQL] Warn if less files visible after stats write

2021-07-13 Thread GitBox
tooptoop4 opened a new pull request #2: URL: https://github.com/apache/spark/pull/2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32401: [SPARK-35276][CORE] Calculate checksum for shuffle data and write as checksum file

2021-07-13 Thread GitBox
AmplabJenkins removed a comment on pull request #32401: URL: https://github.com/apache/spark/pull/32401#issuecomment-879200836 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45489/

[GitHub] [spark] SparkQA commented on pull request #33174: [SPARK-35721][PYTHON] Path level discover for python unittests

2021-07-13 Thread GitBox
SparkQA commented on pull request #33174: URL: https://github.com/apache/spark/pull/33174#issuecomment-879537921 **[Test build #140991 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140991/testReport)** for PR 33174 at commit

[GitHub] [spark] SparkQA commented on pull request #33258: [SPARK-36037][SQL] Support ANSI SQL LOCALTIMESTAMP datetime value function

2021-07-13 Thread GitBox
SparkQA commented on pull request #33258: URL: https://github.com/apache/spark/pull/33258#issuecomment-879537873 **[Test build #140990 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140990/testReport)** for PR 33258 at commit

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33077: [SPARK-34892][SS] Introduce MergingSortWithSessionWindowStateIterator sorting input rows and rows in state efficiently

2021-07-13 Thread GitBox
AmplabJenkins removed a comment on pull request #33077: URL: https://github.com/apache/spark/pull/33077#issuecomment-879536574 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45503/

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32049: [SPARK-34952][SQL] Aggregate (Min/Max/Count) push down for Parquet

2021-07-13 Thread GitBox
AmplabJenkins removed a comment on pull request #32049: URL: https://github.com/apache/spark/pull/32049#issuecomment-879536575 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140984/

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33331: [SPARK-36125][PYTHON] Implement non-equality comparison operators between two Categoricals

2021-07-13 Thread GitBox
AmplabJenkins removed a comment on pull request #1: URL: https://github.com/apache/spark/pull/1#issuecomment-879536572 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45502/

[GitHub] [spark] AmplabJenkins commented on pull request #33331: [SPARK-36125][PYTHON] Implement non-equality comparison operators between two Categoricals

2021-07-13 Thread GitBox
AmplabJenkins commented on pull request #1: URL: https://github.com/apache/spark/pull/1#issuecomment-879536572 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45502/ --

[GitHub] [spark] AmplabJenkins commented on pull request #32049: [SPARK-34952][SQL] Aggregate (Min/Max/Count) push down for Parquet

2021-07-13 Thread GitBox
AmplabJenkins commented on pull request #32049: URL: https://github.com/apache/spark/pull/32049#issuecomment-879536575 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140984/ -- This

[GitHub] [spark] AmplabJenkins commented on pull request #33077: [SPARK-34892][SS] Introduce MergingSortWithSessionWindowStateIterator sorting input rows and rows in state efficiently

2021-07-13 Thread GitBox
AmplabJenkins commented on pull request #33077: URL: https://github.com/apache/spark/pull/33077#issuecomment-879536574 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45503/ --

[GitHub] [spark] ekoifman commented on a change in pull request #32872: [SPARK-35639][SQL] Make hasCoalescedPartition return true if something was actually coalesced

2021-07-13 Thread GitBox
ekoifman commented on a change in pull request #32872: URL: https://github.com/apache/spark/pull/32872#discussion_r669233485 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CustomShuffleReaderExec.scala ## @@ -87,8 +87,15 @@ case class

[GitHub] [spark] Yikun commented on a change in pull request #33174: [SPARK-35721][PYTHON] Path level discover for python unittests

2021-07-13 Thread GitBox
Yikun commented on a change in pull request #33174: URL: https://github.com/apache/spark/pull/33174#discussion_r669228900 ## File path: python/run-tests.py ## @@ -40,6 +44,111 @@ from sparktestsupport.shellutils import which, subprocess_check_output # noqa from

[GitHub] [spark] SparkQA commented on pull request #33077: [SPARK-34892][SS] Introduce MergingSortWithSessionWindowStateIterator sorting input rows and rows in state efficiently

2021-07-13 Thread GitBox
SparkQA commented on pull request #33077: URL: https://github.com/apache/spark/pull/33077#issuecomment-879529600 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45503/ --

[GitHub] [spark] otterc commented on a change in pull request #33329: [WIP][SPARK-35917][SHUFFLE][CORE][3.2] Disable push-based shuffle feature to prevent it from being used

2021-07-13 Thread GitBox
otterc commented on a change in pull request #33329: URL: https://github.com/apache/spark/pull/33329#discussion_r669189297 ## File path: core/src/main/scala/org/apache/spark/internal/config/package.scala ## @@ -2079,7 +2079,7 @@ package object config { "conjunction

[GitHub] [spark] SparkQA removed a comment on pull request #32049: [SPARK-34952][SQL] Aggregate (Min/Max/Count) push down for Parquet

2021-07-13 Thread GitBox
SparkQA removed a comment on pull request #32049: URL: https://github.com/apache/spark/pull/32049#issuecomment-879419291 **[Test build #140984 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140984/testReport)** for PR 32049 at commit

[GitHub] [spark] SparkQA commented on pull request #32049: [SPARK-34952][SQL] Aggregate (Min/Max/Count) push down for Parquet

2021-07-13 Thread GitBox
SparkQA commented on pull request #32049: URL: https://github.com/apache/spark/pull/32049#issuecomment-879528075 **[Test build #140984 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140984/testReport)** for PR 32049 at commit

[GitHub] [spark] SparkQA commented on pull request #33331: [SPARK-36125][PYTHON] Implement non-equality comparison operators between two Categoricals

2021-07-13 Thread GitBox
SparkQA commented on pull request #1: URL: https://github.com/apache/spark/pull/1#issuecomment-879522682 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45502/ -- This is an automated message from the

[GitHub] [spark] wangyum commented on pull request #33324: [SPARK-36093][SQL] RemoveRedundantAliases should not change Command's parameter's expression's name

2021-07-13 Thread GitBox
wangyum commented on pull request #33324: URL: https://github.com/apache/spark/pull/33324#issuecomment-879521634 cc @wangshisan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] SparkQA commented on pull request #33077: [SPARK-34892][SS] Introduce MergingSortWithSessionWindowStateIterator sorting input rows and rows in state efficiently

2021-07-13 Thread GitBox
SparkQA commented on pull request #33077: URL: https://github.com/apache/spark/pull/33077#issuecomment-879512560 **[Test build #140989 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140989/testReport)** for PR 33077 at commit

[GitHub] [spark] sunchao commented on pull request #33330: [SPARK-36123][SQL] Parquet vectorized reader doesn't skip null values correctly

2021-07-13 Thread GitBox
sunchao commented on pull request #0: URL: https://github.com/apache/spark/pull/0#issuecomment-879512049 thanks @dongjoon-hyun - what do you think if I open a separate PR to do the refactoring on the test suite first? it will make the changes easier. -- This is an automated

[GitHub] [spark] sunchao commented on a change in pull request #33330: [SPARK-36123][SQL] Parquet vectorized reader doesn't skip null values correctly

2021-07-13 Thread GitBox
sunchao commented on a change in pull request #0: URL: https://github.com/apache/spark/pull/0#discussion_r669215145 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetColumnIndexSuite.scala ## @@ -31,24 +31,25 @@ class

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33330: [SPARK-36123][SQL] Parquet vectorized reader doesn't skip null values correctly

2021-07-13 Thread GitBox
AmplabJenkins removed a comment on pull request #0: URL: https://github.com/apache/spark/pull/0#issuecomment-879511386 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45500/

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33331: [SPARK-36125][PYTHON] Implement non-equality comparison operators between two Categoricals

2021-07-13 Thread GitBox
AmplabJenkins removed a comment on pull request #1: URL: https://github.com/apache/spark/pull/1#issuecomment-879511385 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140988/

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33077: [SPARK-34892][SS] Introduce MergingSortWithSessionWindowStateIterator sorting input rows and rows in state efficiently

2021-07-13 Thread GitBox
AmplabJenkins removed a comment on pull request #33077: URL: https://github.com/apache/spark/pull/33077#issuecomment-879511382 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45501/

[GitHub] [spark] AmplabJenkins commented on pull request #33331: [SPARK-36125][PYTHON] Implement non-equality comparison operators between two Categoricals

2021-07-13 Thread GitBox
AmplabJenkins commented on pull request #1: URL: https://github.com/apache/spark/pull/1#issuecomment-879511385 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140988/ -- This

[GitHub] [spark] AmplabJenkins commented on pull request #33330: [SPARK-36123][SQL] Parquet vectorized reader doesn't skip null values correctly

2021-07-13 Thread GitBox
AmplabJenkins commented on pull request #0: URL: https://github.com/apache/spark/pull/0#issuecomment-879511386 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45500/ --

[GitHub] [spark] AmplabJenkins commented on pull request #33077: [SPARK-34892][SS] Introduce MergingSortWithSessionWindowStateIterator sorting input rows and rows in state efficiently

2021-07-13 Thread GitBox
AmplabJenkins commented on pull request #33077: URL: https://github.com/apache/spark/pull/33077#issuecomment-879511382 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45501/ --

[GitHub] [spark] ulysses-you commented on a change in pull request #32872: [SPARK-35639][SQL] Make hasCoalescedPartition return true if something was actually coalesced

2021-07-13 Thread GitBox
ulysses-you commented on a change in pull request #32872: URL: https://github.com/apache/spark/pull/32872#discussion_r669214397 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CustomShuffleReaderExec.scala ## @@ -87,8 +87,15 @@ case class

[GitHub] [spark] SparkQA commented on pull request #33331: [SPARK-36125][PYTHON] Implement non-equality comparison operators between two Categoricals

2021-07-13 Thread GitBox
SparkQA commented on pull request #1: URL: https://github.com/apache/spark/pull/1#issuecomment-879508116 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45502/ -- This is an automated message from the Apache

[GitHub] [spark] SparkQA commented on pull request #33077: [SPARK-34892][SS] Introduce MergingSortWithSessionWindowStateIterator sorting input rows and rows in state efficiently

2021-07-13 Thread GitBox
SparkQA commented on pull request #33077: URL: https://github.com/apache/spark/pull/33077#issuecomment-879502187 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45501/ -- This is an automated message from the

[GitHub] [spark] SparkQA removed a comment on pull request #33331: [SPARK-36125][PYTHON] Implement non-equality comparison operators between two Categoricals

2021-07-13 Thread GitBox
SparkQA removed a comment on pull request #1: URL: https://github.com/apache/spark/pull/1#issuecomment-879491612 **[Test build #140988 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140988/testReport)** for PR 1 at commit

[GitHub] [spark] SparkQA commented on pull request #33330: [SPARK-36123][SQL] Parquet vectorized reader doesn't skip null values correctly

2021-07-13 Thread GitBox
SparkQA commented on pull request #0: URL: https://github.com/apache/spark/pull/0#issuecomment-879501811 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45500/ -- This is an automated message from the

[GitHub] [spark] SparkQA commented on pull request #33331: [SPARK-36125][PYTHON] Implement non-equality comparison operators between two Categoricals

2021-07-13 Thread GitBox
SparkQA commented on pull request #1: URL: https://github.com/apache/spark/pull/1#issuecomment-879501280 **[Test build #140988 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140988/testReport)** for PR 1 at commit

[GitHub] [spark] SparkQA commented on pull request #33331: [SPARK-36125][PYTHON] Implement non-equality comparison operators between two Categoricals

2021-07-13 Thread GitBox
SparkQA commented on pull request #1: URL: https://github.com/apache/spark/pull/1#issuecomment-879491612 **[Test build #140988 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140988/testReport)** for PR 1 at commit

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #33330: [SPARK-36123][SQL] Parquet vectorized reader doesn't skip null values correctly

2021-07-13 Thread GitBox
dongjoon-hyun commented on a change in pull request #0: URL: https://github.com/apache/spark/pull/0#discussion_r669196510 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetColumnIndexSuite.scala ## @@ -31,24 +31,25 @@

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33286: [SPARK-36079][SQL] Null-based filter estimate should always be in the range [0, 1]

2021-07-13 Thread GitBox
AmplabJenkins removed a comment on pull request #33286: URL: https://github.com/apache/spark/pull/33286#issuecomment-879490705 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45499/

[GitHub] [spark] AmplabJenkins commented on pull request #33286: [SPARK-36079][SQL] Null-based filter estimate should always be in the range [0, 1]

2021-07-13 Thread GitBox
AmplabJenkins commented on pull request #33286: URL: https://github.com/apache/spark/pull/33286#issuecomment-879490705 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45499/ --

[GitHub] [spark] SparkQA commented on pull request #33077: [SPARK-34892][SS] Introduce MergingSortWithSessionWindowStateIterator sorting input rows and rows in state efficiently

2021-07-13 Thread GitBox
SparkQA commented on pull request #33077: URL: https://github.com/apache/spark/pull/33077#issuecomment-879490397 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45501/ -- This is an automated message from the Apache

[GitHub] [spark] SparkQA commented on pull request #33330: [SPARK-36123][SQL] Parquet vectorized reader doesn't skip null values correctly

2021-07-13 Thread GitBox
SparkQA commented on pull request #0: URL: https://github.com/apache/spark/pull/0#issuecomment-879489845 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45500/ -- This is an automated message from the Apache

[GitHub] [spark] dongjoon-hyun commented on pull request #33330: [SPARK-36123][SQL] Parquet vectorized reader doesn't skip null values correctly

2021-07-13 Thread GitBox
dongjoon-hyun commented on pull request #0: URL: https://github.com/apache/spark/pull/0#issuecomment-879489131 Thank you for pinging me, @sunchao . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] github-actions[bot] closed pull request #31771: [SPARK-34652][AVRO] Support SchemaRegistry in from_avro method

2021-07-13 Thread GitBox
github-actions[bot] closed pull request #31771: URL: https://github.com/apache/spark/pull/31771 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] otterc commented on a change in pull request #33329: [WIP][SPARK-35917][SHUFFLE][CORE][3.2] Disable push-based shuffle feature to prevent it from being used

2021-07-13 Thread GitBox
otterc commented on a change in pull request #33329: URL: https://github.com/apache/spark/pull/33329#discussion_r669189297 ## File path: core/src/main/scala/org/apache/spark/internal/config/package.scala ## @@ -2079,7 +2079,7 @@ package object config { "conjunction

[GitHub] [spark] github-actions[bot] closed pull request #31937: [SPARK-10816][SS] Support session window natively

2021-07-13 Thread GitBox
github-actions[bot] closed pull request #31937: URL: https://github.com/apache/spark/pull/31937 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] github-actions[bot] commented on pull request #31926: [SPARK-34775][SQL] Push down limit through window when partitionSpec is not empty

2021-07-13 Thread GitBox
github-actions[bot] commented on pull request #31926: URL: https://github.com/apache/spark/pull/31926#issuecomment-879483610 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue

[GitHub] [spark] SparkQA commented on pull request #33286: [SPARK-36079][SQL] Null-based filter estimate should always be in the range [0, 1]

2021-07-13 Thread GitBox
SparkQA commented on pull request #33286: URL: https://github.com/apache/spark/pull/33286#issuecomment-879482756 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45499/ -- This is an automated message from the

[GitHub] [spark] xinrong-databricks commented on a change in pull request #33331: [SPARK-36125][PYTHON] Implement non-equality comparison operators between two Categoricals

2021-07-13 Thread GitBox
xinrong-databricks commented on a change in pull request #1: URL: https://github.com/apache/spark/pull/1#discussion_r669186589 ## File path: python/pyspark/pandas/data_type_ops/categorical_ops.py ## @@ -64,15 +66,28 @@ def astype(self, index_ops: IndexOpsLike, dtype:

[GitHub] [spark] xinrong-databricks opened a new pull request #33331: [SPARK-36125][PYTHON] Implement non-equality comparison operators of Categoricals

2021-07-13 Thread GitBox
xinrong-databricks opened a new pull request #1: URL: https://github.com/apache/spark/pull/1 ### What changes were proposed in this pull request? Implement non-equality comparison operators between two Categoricals. ### Why are the changes needed? pandas supports

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33077: [SPARK-34892][SS] Introduce MergingSortWithSessionWindowStateIterator sorting input rows and rows in state efficiently

2021-07-13 Thread GitBox
AmplabJenkins removed a comment on pull request #33077: URL: https://github.com/apache/spark/pull/33077#issuecomment-879391296 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45495/

[GitHub] [spark] SparkQA commented on pull request #33077: [SPARK-34892][SS] Introduce MergingSortWithSessionWindowStateIterator sorting input rows and rows in state efficiently

2021-07-13 Thread GitBox
SparkQA commented on pull request #33077: URL: https://github.com/apache/spark/pull/33077#issuecomment-879471127 **[Test build #140987 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140987/testReport)** for PR 33077 at commit

[GitHub] [spark] SparkQA commented on pull request #33330: [SPARK-36123][SQL] Parquet vectorized reader doesn't skip null values correctly

2021-07-13 Thread GitBox
SparkQA commented on pull request #0: URL: https://github.com/apache/spark/pull/0#issuecomment-879470999 **[Test build #140986 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140986/testReport)** for PR 0 at commit

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33327: [SPARK-36109][SS][TEST][3.1] Check data after adding data to topic in KafkaSourceStressSuite

2021-07-13 Thread GitBox
AmplabJenkins removed a comment on pull request #33327: URL: https://github.com/apache/spark/pull/33327#issuecomment-879470604 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140980/

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32049: [SPARK-34952][SQL] Aggregate (Min/Max/Count) push down for Parquet

2021-07-13 Thread GitBox
AmplabJenkins removed a comment on pull request #32049: URL: https://github.com/apache/spark/pull/32049#issuecomment-879470602 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45498/

[GitHub] [spark] AmplabJenkins commented on pull request #33327: [SPARK-36109][SS][TEST][3.1] Check data after adding data to topic in KafkaSourceStressSuite

2021-07-13 Thread GitBox
AmplabJenkins commented on pull request #33327: URL: https://github.com/apache/spark/pull/33327#issuecomment-879470604 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140980/ -- This

[GitHub] [spark] AmplabJenkins commented on pull request #32049: [SPARK-34952][SQL] Aggregate (Min/Max/Count) push down for Parquet

2021-07-13 Thread GitBox
AmplabJenkins commented on pull request #32049: URL: https://github.com/apache/spark/pull/32049#issuecomment-879470602 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45498/ --

[GitHub] [spark] SparkQA commented on pull request #33286: [SPARK-36079][SQL] Null-based filter estimate should always be in the range [0, 1]

2021-07-13 Thread GitBox
SparkQA commented on pull request #33286: URL: https://github.com/apache/spark/pull/33286#issuecomment-879470445 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45499/ -- This is an automated message from the Apache

[GitHub] [spark] SparkQA commented on pull request #32049: [SPARK-34952][SQL] Aggregate (Min/Max/Count) push down for Parquet

2021-07-13 Thread GitBox
SparkQA commented on pull request #32049: URL: https://github.com/apache/spark/pull/32049#issuecomment-879459700 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45498/ -- This is an automated message from the

[GitHub] [spark] SparkQA removed a comment on pull request #33327: [SPARK-36109][SS][TEST][3.1] Check data after adding data to topic in KafkaSourceStressSuite

2021-07-13 Thread GitBox
SparkQA removed a comment on pull request #33327: URL: https://github.com/apache/spark/pull/33327#issuecomment-879316709 **[Test build #140980 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140980/testReport)** for PR 33327 at commit

[GitHub] [spark] SparkQA commented on pull request #33327: [SPARK-36109][SS][TEST][3.1] Check data after adding data to topic in KafkaSourceStressSuite

2021-07-13 Thread GitBox
SparkQA commented on pull request #33327: URL: https://github.com/apache/spark/pull/33327#issuecomment-879458752 **[Test build #140980 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140980/testReport)** for PR 33327 at commit

[GitHub] [spark] sunchao commented on pull request #33330: [SPARK-36123][SQL] Parquet vectorized reader doesn't skip null values correctly

2021-07-13 Thread GitBox
sunchao commented on pull request #0: URL: https://github.com/apache/spark/pull/0#issuecomment-879455576 cc @dongjoon-hyun @viirya @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] sunchao opened a new pull request #33330: [SPARK-36123][SQL] Parquet vectorized reader doesn't skip null values correctly

2021-07-13 Thread GitBox
sunchao opened a new pull request #0: URL: https://github.com/apache/spark/pull/0 ### What changes were proposed in this pull request? Fix the skipping values logic in Parquet vectorized reader when column index is effective, by considering nulls. Also refactored

[GitHub] [spark] dongjoon-hyun commented on pull request #33329: [WIP][SPARK-35917][SHUFFLE][CORE][3.2] Disable push-based shuffle feature to prevent it from being used

2021-07-13 Thread GitBox
dongjoon-hyun commented on pull request #33329: URL: https://github.com/apache/spark/pull/33329#issuecomment-879452177 I'm okay with disabling this, but I'm wondering what is the different from branch-3.1 because branch-3.1 also has the same configuration. If we want to disable this

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #33329: [WIP][SPARK-35917][SHUFFLE][CORE]Disable push-based shuffle feature to prevent it from being used

2021-07-13 Thread GitBox
dongjoon-hyun commented on a change in pull request #33329: URL: https://github.com/apache/spark/pull/33329#discussion_r669153013 ## File path: core/src/main/scala/org/apache/spark/util/Utils.scala ## @@ -2598,11 +2598,16 @@ private[spark] object Utils extends Logging { *

[GitHub] [spark] SparkQA commented on pull request #33286: [SPARK-36079][SQL] Null-based filter estimate should always be in the range [0, 1]

2021-07-13 Thread GitBox
SparkQA commented on pull request #33286: URL: https://github.com/apache/spark/pull/33286#issuecomment-879449174 **[Test build #140985 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140985/testReport)** for PR 33286 at commit

[GitHub] [spark] viirya commented on pull request #33326: [SPARK-36109][SS][TEST][3.0] Check data after adding data to topic in KafkaSourceStressSuite

2021-07-13 Thread GitBox
viirya commented on pull request #33326: URL: https://github.com/apache/spark/pull/33326#issuecomment-879449209 Thank you @dongjoon-hyun and @HeartSaVioR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #33329: [WIP][SPARK-35917][SHUFFLE][CORE]Disable push-based shuffle feature to prevent it from being used

2021-07-13 Thread GitBox
dongjoon-hyun commented on a change in pull request #33329: URL: https://github.com/apache/spark/pull/33329#discussion_r669153013 ## File path: core/src/main/scala/org/apache/spark/util/Utils.scala ## @@ -2598,11 +2598,16 @@ private[spark] object Utils extends Logging { *

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #33329: [WIP][SPARK-35917][SHUFFLE][CORE]Disable push-based shuffle feature to prevent it from being used

2021-07-13 Thread GitBox
dongjoon-hyun commented on a change in pull request #33329: URL: https://github.com/apache/spark/pull/33329#discussion_r669151977 ## File path: core/src/main/scala/org/apache/spark/internal/config/package.scala ## @@ -2079,7 +2079,7 @@ package object config {

[GitHub] [spark] SparkQA commented on pull request #32049: [SPARK-34952][SQL] Aggregate (Min/Max/Count) push down for Parquet

2021-07-13 Thread GitBox
SparkQA commented on pull request #32049: URL: https://github.com/apache/spark/pull/32049#issuecomment-879446170 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45498/ -- This is an automated message from the Apache

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33286: [SPARK-36079][SQL] Null-based filter estimate should always be in the range [0, 1]

2021-07-13 Thread GitBox
AmplabJenkins removed a comment on pull request #33286: URL: https://github.com/apache/spark/pull/33286#issuecomment-879420335 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140983/

[GitHub] [spark] SparkQA removed a comment on pull request #33286: [SPARK-36079][SQL] Null-based filter estimate should always be in the range [0, 1]

2021-07-13 Thread GitBox
SparkQA removed a comment on pull request #33286: URL: https://github.com/apache/spark/pull/33286#issuecomment-879416458 **[Test build #140983 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140983/testReport)** for PR 33286 at commit

[GitHub] [spark] SparkQA commented on pull request #33286: [SPARK-36079][SQL] Null-based filter estimate should always be in the range [0, 1]

2021-07-13 Thread GitBox
SparkQA commented on pull request #33286: URL: https://github.com/apache/spark/pull/33286#issuecomment-879420312 **[Test build #140983 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140983/testReport)** for PR 33286 at commit

[GitHub] [spark] AmplabJenkins commented on pull request #33286: [SPARK-36079][SQL] Null-based filter estimate should always be in the range [0, 1]

2021-07-13 Thread GitBox
AmplabJenkins commented on pull request #33286: URL: https://github.com/apache/spark/pull/33286#issuecomment-879420335 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140983/ -- This

[GitHub] [spark] SparkQA commented on pull request #33326: [SPARK-36109][SS][TEST][3.0] Check data after adding data to topic in KafkaSourceStressSuite

2021-07-13 Thread GitBox
SparkQA commented on pull request #33326: URL: https://github.com/apache/spark/pull/33326#issuecomment-879420392 **[Test build #140979 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140979/testReport)** for PR 33326 at commit

<    1   2   3   4   5   6   7   >