[GitHub] [spark] cloud-fan closed pull request #33418: [SPARK-36093][SQL][3.0] RemoveRedundantAliases should not change Command's parameter's expression's name

2021-07-19 Thread GitBox
cloud-fan closed pull request #33418: URL: https://github.com/apache/spark/pull/33418 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] cloud-fan commented on pull request #33418: [SPARK-36093][SQL][3.0] RemoveRedundantAliases should not change Command's parameter's expression's name

2021-07-19 Thread GitBox
cloud-fan commented on pull request #33418: URL: https://github.com/apache/spark/pull/33418#issuecomment-883018708 thanks, merging to 3.0! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] cloud-fan commented on a change in pull request #33212: [SPARK-35912][SQL] Fix nullability of `spark.read.json`

2021-07-19 Thread GitBox
cloud-fan commented on a change in pull request #33212: URL: https://github.com/apache/spark/pull/33212#discussion_r672775023 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala ## @@ -405,10 +405,18 @@ class JacksonParser(

[GitHub] [spark] tobiasedwards commented on pull request #33428: [PYTHON] Fix pyspark.sql.types.Row type annotation

2021-07-19 Thread GitBox
tobiasedwards commented on pull request #33428: URL: https://github.com/apache/spark/pull/33428#issuecomment-883015705 Will do, thanks for that! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] HyukjinKwon commented on pull request #33428: [PYTHON] Fix pyspark.sql.types.Row type annotation

2021-07-19 Thread GitBox
HyukjinKwon commented on pull request #33428: URL: https://github.com/apache/spark/pull/33428#issuecomment-883014506 yeah, looks like we should fix. Would you mind filing a JIRA and link it to the PR title please? (see also https://spark.apache.org/contributing.html). Also please enable

[GitHub] [spark] Ngone51 commented on pull request #32401: [SPARK-35276][CORE] Calculate checksum for shuffle data and write as checksum file

2021-07-19 Thread GitBox
Ngone51 commented on pull request #32401: URL: https://github.com/apache/spark/pull/32401#issuecomment-883014024 Thanks, @mridulm @cloud-fan I'll try my best to push the validation PR first (I'm working on it right now). We could revert this later if we can't get the validation PR in.

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33400: [SPARK-36186][PYTHON] Add as_ordered/as_unordered to CategoricalAccessor and CategoricalIndex

2021-07-19 Thread GitBox
AmplabJenkins removed a comment on pull request #33400: URL: https://github.com/apache/spark/pull/33400#issuecomment-883010803 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45801/

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33402: [SPARK-36189][PYTHON] Improve bool, string, numeric DataTypeOps tests by avoiding joins

2021-07-19 Thread GitBox
AmplabJenkins removed a comment on pull request #33402: URL: https://github.com/apache/spark/pull/33402#issuecomment-883010800 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33424: [SPARK-36213][SQL] Normalize PartitionSpec for Describe Table Command with PartitionSpec

2021-07-19 Thread GitBox
AmplabJenkins removed a comment on pull request #33424: URL: https://github.com/apache/spark/pull/33424#issuecomment-883010804 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45802/

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33422: [SPARK-34806][SQL] Add Observation helper for Dataset.observe

2021-07-19 Thread GitBox
AmplabJenkins removed a comment on pull request #33422: URL: https://github.com/apache/spark/pull/33422#issuecomment-883010801 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45799/

[GitHub] [spark] SparkQA removed a comment on pull request #33402: [SPARK-36189][PYTHON] Improve bool, string, numeric DataTypeOps tests by avoiding joins

2021-07-19 Thread GitBox
SparkQA removed a comment on pull request #33402: URL: https://github.com/apache/spark/pull/33402#issuecomment-882986419 **[Test build #141289 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141289/testReport)** for PR 33402 at commit

[GitHub] [spark] AmplabJenkins commented on pull request #33428: [PYTHON] Fix pyspark.sql.types.Row type annotation

2021-07-19 Thread GitBox
AmplabJenkins commented on pull request #33428: URL: https://github.com/apache/spark/pull/33428#issuecomment-883011701 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] tobiasedwards opened a new pull request #33428: [PYTHON] Fix pyspark.sql.types.Row type annotation

2021-07-19 Thread GitBox
tobiasedwards opened a new pull request #33428: URL: https://github.com/apache/spark/pull/33428 ### What changes were proposed in this pull request? This change changes the type annotations for `pyspark.sql.types.Row`'s `__new__` and `__init__` methods when invoked without

[GitHub] [spark] SparkQA commented on pull request #33352: [SPARK-34952][SQL] DSv2 Aggregate push down APIs

2021-07-19 Thread GitBox
SparkQA commented on pull request #33352: URL: https://github.com/apache/spark/pull/33352#issuecomment-883011276 **[Test build #141292 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141292/testReport)** for PR 33352 at commit

[GitHub] [spark] SparkQA commented on pull request #33409: [SPARK-36201][SQL] Schema check should check inner field too

2021-07-19 Thread GitBox
SparkQA commented on pull request #33409: URL: https://github.com/apache/spark/pull/33409#issuecomment-883011287 **[Test build #141291 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141291/testReport)** for PR 33409 at commit

[GitHub] [spark] SparkQA commented on pull request #33427: [SPARK-36216][PYTHON][TESTS] Increase timeout for StreamingLinearRegressionWithTests. test_parameter_convergence

2021-07-19 Thread GitBox
SparkQA commented on pull request #33427: URL: https://github.com/apache/spark/pull/33427#issuecomment-883011237 **[Test build #141290 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141290/testReport)** for PR 33427 at commit

[GitHub] [spark] AmplabJenkins commented on pull request #33422: [SPARK-34806][SQL] Add Observation helper for Dataset.observe

2021-07-19 Thread GitBox
AmplabJenkins commented on pull request #33422: URL: https://github.com/apache/spark/pull/33422#issuecomment-883010801 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45799/ --

[GitHub] [spark] AmplabJenkins commented on pull request #33402: [SPARK-36189][PYTHON] Improve bool, string, numeric DataTypeOps tests by avoiding joins

2021-07-19 Thread GitBox
AmplabJenkins commented on pull request #33402: URL: https://github.com/apache/spark/pull/33402#issuecomment-883010800 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] AmplabJenkins commented on pull request #33400: [SPARK-36186][PYTHON] Add as_ordered/as_unordered to CategoricalAccessor and CategoricalIndex

2021-07-19 Thread GitBox
AmplabJenkins commented on pull request #33400: URL: https://github.com/apache/spark/pull/33400#issuecomment-883010803 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45801/ --

[GitHub] [spark] AmplabJenkins commented on pull request #33424: [SPARK-36213][SQL] Normalize PartitionSpec for Describe Table Command with PartitionSpec

2021-07-19 Thread GitBox
AmplabJenkins commented on pull request #33424: URL: https://github.com/apache/spark/pull/33424#issuecomment-883010804 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45802/ --

[GitHub] [spark] huaxingao commented on a change in pull request #33352: [SPARK-34952][SQL] DSv2 Aggregate push down APIs

2021-07-19 Thread GitBox
huaxingao commented on a change in pull request #33352: URL: https://github.com/apache/spark/pull/33352#discussion_r672767768 ## File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/Aggregation.java ## @@ -0,0 +1,46 @@ +/* + * Licensed to the

[GitHub] [spark] cloud-fan commented on pull request #32401: [SPARK-35276][CORE] Calculate checksum for shuffle data and write as checksum file

2021-07-19 Thread GitBox
cloud-fan commented on pull request #32401: URL: https://github.com/apache/spark/pull/32401#issuecomment-883008646 I think there are around 2 weeks left and it seems promising to merge the verification PR before RC. +1 to have this feature in 3.2 to improve stability. -- This is an

[GitHub] [spark] SparkQA commented on pull request #33402: [SPARK-36189][PYTHON] Improve bool, string, numeric DataTypeOps tests by avoiding joins

2021-07-19 Thread GitBox
SparkQA commented on pull request #33402: URL: https://github.com/apache/spark/pull/33402#issuecomment-883004083 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45803/ --

[GitHub] [spark] SparkQA commented on pull request #33424: [SPARK-36213][SQL] Normalize PartitionSpec for Describe Table Command with PartitionSpec

2021-07-19 Thread GitBox
SparkQA commented on pull request #33424: URL: https://github.com/apache/spark/pull/33424#issuecomment-883002904 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45802/ -- This is an automated message from the

[GitHub] [spark] AngersZhuuuu commented on pull request #33409: [SPARK-36201][SQL] Schema check should check inner field too

2021-07-19 Thread GitBox
AngersZh commented on pull request #33409: URL: https://github.com/apache/spark/pull/33409#issuecomment-883001709 > hm, GA test seems failing. Seems all my UT failed like this, can't find why.. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] SparkQA commented on pull request #33402: [SPARK-36189][PYTHON] Improve bool, string, numeric DataTypeOps tests by avoiding joins

2021-07-19 Thread GitBox
SparkQA commented on pull request #33402: URL: https://github.com/apache/spark/pull/33402#issuecomment-883001069 **[Test build #141289 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141289/testReport)** for PR 33402 at commit

[GitHub] [spark] SparkQA commented on pull request #33422: [SPARK-34806][SQL] Add Observation helper for Dataset.observe

2021-07-19 Thread GitBox
SparkQA commented on pull request #33422: URL: https://github.com/apache/spark/pull/33422#issuecomment-883000887 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45799/ -- This is an automated message from the

[GitHub] [spark] HyukjinKwon opened a new pull request #33427: [SPARK-36216][PYTHON][TESTS] Increase timeout for StreamingLinearRegressionWithTests. test_parameter_convergence

2021-07-19 Thread GitBox
HyukjinKwon opened a new pull request #33427: URL: https://github.com/apache/spark/pull/33427 ### What changes were proposed in this pull request? Test is flaky (https://github.com/apache/spark/runs/3109815586): ``` Traceback (most recent call last): File

[GitHub] [spark] SparkQA commented on pull request #33400: [SPARK-36186][PYTHON] Add as_ordered/as_unordered to CategoricalAccessor and CategoricalIndex

2021-07-19 Thread GitBox
SparkQA commented on pull request #33400: URL: https://github.com/apache/spark/pull/33400#issuecomment-882999414 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45801/ -- This is an automated message from the

[GitHub] [spark] beliefer commented on a change in pull request #33299: [SPARK-36046][SQL] Support new functions make_timestamp_ntz and make_timestamp_ltz

2021-07-19 Thread GitBox
beliefer commented on a change in pull request #33299: URL: https://github.com/apache/spark/pull/33299#discussion_r672756749 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala ## @@ -552,6 +552,8 @@ object FunctionRegistry

[GitHub] [spark] beliefer commented on pull request #33413: [SPARK-36175][SQL] Support TimestampNTZ in Avro data source

2021-07-19 Thread GitBox
beliefer commented on pull request #33413: URL: https://github.com/apache/spark/pull/33413#issuecomment-882997044 ping @gengliangwang @cloud-fan @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] ueshin closed pull request #33384: [SPARK-36167][PYTHON][3.2] Revisit more InternalField managements

2021-07-19 Thread GitBox
ueshin closed pull request #33384: URL: https://github.com/apache/spark/pull/33384 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] SparkQA removed a comment on pull request #33284: [SPARK-36063][SQL] Optimize OneRowRelation subqueries

2021-07-19 Thread GitBox
SparkQA removed a comment on pull request #33284: URL: https://github.com/apache/spark/pull/33284#issuecomment-882861686 **[Test build #141280 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141280/testReport)** for PR 33284 at commit

[GitHub] [spark] SparkQA removed a comment on pull request #33400: [SPARK-36186][PYTHON] Add as_ordered/as_unordered to CategoricalAccessor and CategoricalIndex

2021-07-19 Thread GitBox
SparkQA removed a comment on pull request #33400: URL: https://github.com/apache/spark/pull/33400#issuecomment-882963673 **[Test build #141287 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141287/testReport)** for PR 33400 at commit

[GitHub] [spark] SparkQA removed a comment on pull request #33078: [SPARK-35546][Shuffle] Enable push-based shuffle when multiple app attempts are enabled and manage concurrent access to the state in

2021-07-19 Thread GitBox
SparkQA removed a comment on pull request #33078: URL: https://github.com/apache/spark/pull/33078#issuecomment-882912823 **[Test build #141284 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141284/testReport)** for PR 33078 at commit

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33078: [SPARK-35546][Shuffle] Enable push-based shuffle when multiple app attempts are enabled and manage concurrent access to the sta

2021-07-19 Thread GitBox
AmplabJenkins removed a comment on pull request #33078: URL: https://github.com/apache/spark/pull/33078#issuecomment-882984886 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33400: [SPARK-36186][PYTHON] Add as_ordered/as_unordered to CategoricalAccessor and CategoricalIndex

2021-07-19 Thread GitBox
AmplabJenkins removed a comment on pull request #33400: URL: https://github.com/apache/spark/pull/33400#issuecomment-882984888 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141287/

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33284: [SPARK-36063][SQL] Optimize OneRowRelation subqueries

2021-07-19 Thread GitBox
AmplabJenkins removed a comment on pull request #33284: URL: https://github.com/apache/spark/pull/33284#issuecomment-882984885 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141280/

[GitHub] [spark] SparkQA commented on pull request #33402: [SPARK-36189][PYTHON] Improve bool, string, numeric DataTypeOps tests by avoiding joins

2021-07-19 Thread GitBox
SparkQA commented on pull request #33402: URL: https://github.com/apache/spark/pull/33402#issuecomment-882986419 **[Test build #141289 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141289/testReport)** for PR 33402 at commit

[GitHub] [spark] SparkQA commented on pull request #33424: [SPARK-36213][SQL] Normalize PartitionSpec for Describe Table Command with PartitionSpec

2021-07-19 Thread GitBox
SparkQA commented on pull request #33424: URL: https://github.com/apache/spark/pull/33424#issuecomment-882985743 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45802/ -- This is an automated message from the Apache

[GitHub] [spark] AmplabJenkins commented on pull request #33284: [SPARK-36063][SQL] Optimize OneRowRelation subqueries

2021-07-19 Thread GitBox
AmplabJenkins commented on pull request #33284: URL: https://github.com/apache/spark/pull/33284#issuecomment-882984885 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141280/ -- This

[GitHub] [spark] AmplabJenkins commented on pull request #33400: [SPARK-36186][PYTHON] Add as_ordered/as_unordered to CategoricalAccessor and CategoricalIndex

2021-07-19 Thread GitBox
AmplabJenkins commented on pull request #33400: URL: https://github.com/apache/spark/pull/33400#issuecomment-882984888 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141287/ -- This

[GitHub] [spark] AmplabJenkins commented on pull request #33078: [SPARK-35546][Shuffle] Enable push-based shuffle when multiple app attempts are enabled and manage concurrent access to the state in a

2021-07-19 Thread GitBox
AmplabJenkins commented on pull request #33078: URL: https://github.com/apache/spark/pull/33078#issuecomment-882984887 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] JkSelf commented on a change in pull request #33188: [SPARK-35989][SQL] Only remove redundant shuffle if shuffle origin is REPARTITION_BY_COL in AQE

2021-07-19 Thread GitBox
JkSelf commented on a change in pull request #33188: URL: https://github.com/apache/spark/pull/33188#discussion_r672749686 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala ## @@ -250,7 +250,12 @@ object EnsureRequirements

[GitHub] [spark] mridulm edited a comment on pull request #32401: [SPARK-35276][CORE] Calculate checksum for shuffle data and write as checksum file

2021-07-19 Thread GitBox
mridulm edited a comment on pull request #32401: URL: https://github.com/apache/spark/pull/32401#issuecomment-882983762 @gengliangwang what sort of timeline do we have before we go into RC ? If the pr is reasonably close to being done @Ngone51, we should be able to focus and prioritize

[GitHub] [spark] mridulm commented on pull request #32401: [SPARK-35276][CORE] Calculate checksum for shuffle data and write as checksum file

2021-07-19 Thread GitBox
mridulm commented on pull request #32401: URL: https://github.com/apache/spark/pull/32401#issuecomment-882983762 @gengliangwang what sort of timeline do we have before we go into RC ? If the pr is reasonably close to being done @Ngone51, we should be able to focus on the reviews and get

[GitHub] [spark] SparkQA commented on pull request #33284: [SPARK-36063][SQL] Optimize OneRowRelation subqueries

2021-07-19 Thread GitBox
SparkQA commented on pull request #33284: URL: https://github.com/apache/spark/pull/33284#issuecomment-882981842 **[Test build #141280 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141280/testReport)** for PR 33284 at commit

[GitHub] [spark] SparkQA commented on pull request #33422: [SPARK-34806][SQL] Add Observation helper for Dataset.observe

2021-07-19 Thread GitBox
SparkQA commented on pull request #33422: URL: https://github.com/apache/spark/pull/33422#issuecomment-882981513 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45799/ -- This is an automated message from the Apache

[GitHub] [spark] SparkQA commented on pull request #33400: [SPARK-36186][PYTHON] Add as_ordered/as_unordered to CategoricalAccessor and CategoricalIndex

2021-07-19 Thread GitBox
SparkQA commented on pull request #33400: URL: https://github.com/apache/spark/pull/33400#issuecomment-882979907 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45801/ -- This is an automated message from the Apache

[GitHub] [spark] SparkQA commented on pull request #33078: [SPARK-35546][Shuffle] Enable push-based shuffle when multiple app attempts are enabled and manage concurrent access to the state in a better

2021-07-19 Thread GitBox
SparkQA commented on pull request #33078: URL: https://github.com/apache/spark/pull/33078#issuecomment-882979183 **[Test build #141284 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141284/testReport)** for PR 33078 at commit

[GitHub] [spark] SparkQA commented on pull request #33078: [SPARK-35546][Shuffle] Enable push-based shuffle when multiple app attempts are enabled and manage concurrent access to the state in a better

2021-07-19 Thread GitBox
SparkQA commented on pull request #33078: URL: https://github.com/apache/spark/pull/33078#issuecomment-882976753 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45800/ --

[GitHub] [spark] SparkQA commented on pull request #33400: [SPARK-36186][PYTHON] Add as_ordered/as_unordered to CategoricalAccessor and CategoricalIndex

2021-07-19 Thread GitBox
SparkQA commented on pull request #33400: URL: https://github.com/apache/spark/pull/33400#issuecomment-882974086 **[Test build #141287 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141287/testReport)** for PR 33400 at commit

[GitHub] [spark] shardulm94 commented on a change in pull request #33328: [SPARK-28266][SQL] convertToLogicalRelation should not interpret `path` property when reading Hive tables

2021-07-19 Thread GitBox
shardulm94 commented on a change in pull request #33328: URL: https://github.com/apache/spark/pull/33328#discussion_r672741520 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ## @@ -244,7 +244,11 @@ private[hive] class

[GitHub] [spark] dgd-contributor commented on a change in pull request #33317: [SPARK-36095][CORE] Grouping exception in core/rdd

2021-07-19 Thread GitBox
dgd-contributor commented on a change in pull request #33317: URL: https://github.com/apache/spark/pull/33317#discussion_r672738789 ## File path: core/src/main/scala/org/apache/spark/errors/ExecutionErrors.scala ## @@ -0,0 +1,142 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] SparkQA commented on pull request #33424: [SPARK-36213][SQL] Normalize PartitionSpec for Describe Table Command with PartitionSpec

2021-07-19 Thread GitBox
SparkQA commented on pull request #33424: URL: https://github.com/apache/spark/pull/33424#issuecomment-882968356 **[Test build #141288 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141288/testReport)** for PR 33424 at commit

[GitHub] [spark] yaooqinn commented on pull request #33424: [SPARK-36213][SQL] Normalize PartitionSpec for Describe Table Command with PartitionSpec

2021-07-19 Thread GitBox
yaooqinn commented on pull request #33424: URL: https://github.com/apache/spark/pull/33424#issuecomment-882966806 retest this please -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] SparkQA commented on pull request #33400: [SPARK-36186][PYTHON] Add as_ordered/as_unordered to CategoricalAccessor and CategoricalIndex

2021-07-19 Thread GitBox
SparkQA commented on pull request #33400: URL: https://github.com/apache/spark/pull/33400#issuecomment-882963673 **[Test build #141287 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141287/testReport)** for PR 33400 at commit

[GitHub] [spark] ueshin commented on pull request #33400: [SPARK-36186][PYTHON] Add as_ordered/as_unordered to CategoricalAccessor and CategoricalIndex

2021-07-19 Thread GitBox
ueshin commented on pull request #33400: URL: https://github.com/apache/spark/pull/33400#issuecomment-882962822 @xinrong-databricks I updated a bit. Could you take another look? Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] SparkQA commented on pull request #33078: [SPARK-35546][Shuffle] Enable push-based shuffle when multiple app attempts are enabled and manage concurrent access to the state in a better

2021-07-19 Thread GitBox
SparkQA commented on pull request #33078: URL: https://github.com/apache/spark/pull/33078#issuecomment-882962666 **[Test build #141286 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141286/testReport)** for PR 33078 at commit

[GitHub] [spark] SparkQA commented on pull request #33422: [SPARK-34806][SQL] Add Observation helper for Dataset.observe

2021-07-19 Thread GitBox
SparkQA commented on pull request #33422: URL: https://github.com/apache/spark/pull/33422#issuecomment-882962473 **[Test build #141285 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141285/testReport)** for PR 33422 at commit

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33239: [SPARK-36030][SQL] Support DS v2 metrics at writing path

2021-07-19 Thread GitBox
AmplabJenkins removed a comment on pull request #33239: URL: https://github.com/apache/spark/pull/33239#issuecomment-882961112 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141275/

[GitHub] [spark] SparkQA removed a comment on pull request #33239: [SPARK-36030][SQL] Support DS v2 metrics at writing path

2021-07-19 Thread GitBox
SparkQA removed a comment on pull request #33239: URL: https://github.com/apache/spark/pull/33239#issuecomment-882816254 **[Test build #141275 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141275/testReport)** for PR 33239 at commit

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33078: [SPARK-35546][Shuffle] Enable push-based shuffle when multiple app attempts are enabled and manage concurrent access to the sta

2021-07-19 Thread GitBox
AmplabJenkins removed a comment on pull request #33078: URL: https://github.com/apache/spark/pull/33078#issuecomment-882961116 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45798/

[GitHub] [spark] SparkQA removed a comment on pull request #33310: [SPARK-36105][SQL] OptimizeLocalShuffleReader support reading data of multiple mappers in one task

2021-07-19 Thread GitBox
SparkQA removed a comment on pull request #33310: URL: https://github.com/apache/spark/pull/33310#issuecomment-882823807 **[Test build #141277 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141277/testReport)** for PR 33310 at commit

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33422: [SPARK-34806][SQL] Add Observation helper for Dataset.observe

2021-07-19 Thread GitBox
AmplabJenkins removed a comment on pull request #33422: URL: https://github.com/apache/spark/pull/33422#issuecomment-882628557 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33310: [SPARK-36105][SQL] OptimizeLocalShuffleReader support reading data of multiple mappers in one task

2021-07-19 Thread GitBox
AmplabJenkins removed a comment on pull request #33310: URL: https://github.com/apache/spark/pull/33310#issuecomment-88296 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141277/

[GitHub] [spark] AmplabJenkins commented on pull request #33426: [SPARK-32920][FOLLOW-UP] Fix shuffleMergeFinalized directly calling rdd.getNumPartitions as RDD is not serialized to executor

2021-07-19 Thread GitBox
AmplabJenkins commented on pull request #33426: URL: https://github.com/apache/spark/pull/33426#issuecomment-882961638 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] AmplabJenkins commented on pull request #33239: [SPARK-36030][SQL] Support DS v2 metrics at writing path

2021-07-19 Thread GitBox
AmplabJenkins commented on pull request #33239: URL: https://github.com/apache/spark/pull/33239#issuecomment-882961112 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141275/ -- This

[GitHub] [spark] AmplabJenkins commented on pull request #33310: [SPARK-36105][SQL] OptimizeLocalShuffleReader support reading data of multiple mappers in one task

2021-07-19 Thread GitBox
AmplabJenkins commented on pull request #33310: URL: https://github.com/apache/spark/pull/33310#issuecomment-88296 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141277/ -- This

[GitHub] [spark] AmplabJenkins commented on pull request #33078: [SPARK-35546][Shuffle] Enable push-based shuffle when multiple app attempts are enabled and manage concurrent access to the state in a

2021-07-19 Thread GitBox
AmplabJenkins commented on pull request #33078: URL: https://github.com/apache/spark/pull/33078#issuecomment-882961116 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45798/ --

[GitHub] [spark] HyukjinKwon closed pull request #33393: [SPARK-36179][SQL] Support TimestampNTZType in SparkGetColumnsOperation

2021-07-19 Thread GitBox
HyukjinKwon closed pull request #33393: URL: https://github.com/apache/spark/pull/33393 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] HyukjinKwon commented on pull request #33393: [SPARK-36179][SQL] Support TimestampNTZType in SparkGetColumnsOperation

2021-07-19 Thread GitBox
HyukjinKwon commented on pull request #33393: URL: https://github.com/apache/spark/pull/33393#issuecomment-882959937 Merged to master and branch-3.2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] HyukjinKwon commented on a change in pull request #33212: [SPARK-35912][SQL] Fix nullability of `spark.read.json`

2021-07-19 Thread GitBox
HyukjinKwon commented on a change in pull request #33212: URL: https://github.com/apache/spark/pull/33212#discussion_r672726759 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala ## @@ -405,10 +405,18 @@ class JacksonParser(

[GitHub] [spark] HyukjinKwon commented on a change in pull request #33393: [SPARK-36179][SQL] Support TimestampNTZType in SparkGetColumnsOperation

2021-07-19 Thread GitBox
HyukjinKwon commented on a change in pull request #33393: URL: https://github.com/apache/spark/pull/33393#discussion_r672725321 ## File path: sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/SparkMetadataOperationSuite.scala ## @@ -660,4 +660,29 @@

[GitHub] [spark] HyukjinKwon commented on a change in pull request #33393: [SPARK-36179][SQL] Support TimestampNTZType in SparkGetColumnsOperation

2021-07-19 Thread GitBox
HyukjinKwon commented on a change in pull request #33393: URL: https://github.com/apache/spark/pull/33393#discussion_r672725140 ## File path: sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/SparkMetadataOperationSuite.scala ## @@ -660,4 +660,29 @@

[GitHub] [spark] HyukjinKwon commented on a change in pull request #33393: [SPARK-36179][SQL] Support TimestampNTZType in SparkGetColumnsOperation

2021-07-19 Thread GitBox
HyukjinKwon commented on a change in pull request #33393: URL: https://github.com/apache/spark/pull/33393#discussion_r672725073 ## File path: sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/SparkMetadataOperationSuite.scala ## @@ -660,4 +660,29 @@

[GitHub] [spark] HyukjinKwon commented on pull request #33393: [SPARK-36179][SQL] Support TimestampNTZType in SparkGetColumnsOperation

2021-07-19 Thread GitBox
HyukjinKwon commented on pull request #33393: URL: https://github.com/apache/spark/pull/33393#issuecomment-882955831 Yeah we can ignore that failure for now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HyukjinKwon commented on pull request #33409: [SPARK-36201][SQL] Schema check should check inner field too

2021-07-19 Thread GitBox
HyukjinKwon commented on pull request #33409: URL: https://github.com/apache/spark/pull/33409#issuecomment-882955387 hm, GA test seems failing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] HyukjinKwon commented on pull request #33384: [SPARK-36167][PYTHON][3.2] Revisit more InternalField managements

2021-07-19 Thread GitBox
HyukjinKwon commented on pull request #33384: URL: https://github.com/apache/spark/pull/33384#issuecomment-882952778 Merged to branch-3.2. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon closed pull request #33388: [SPARK-36176][PYTHON] Expose tableExists in pyspark.sql.catalog

2021-07-19 Thread GitBox
HyukjinKwon closed pull request #33388: URL: https://github.com/apache/spark/pull/33388 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] SparkQA commented on pull request #33310: [SPARK-36105][SQL] OptimizeLocalShuffleReader support reading data of multiple mappers in one task

2021-07-19 Thread GitBox
SparkQA commented on pull request #33310: URL: https://github.com/apache/spark/pull/33310#issuecomment-882952638 **[Test build #141277 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141277/testReport)** for PR 33310 at commit

[GitHub] [spark] HyukjinKwon commented on pull request #33388: [SPARK-36176][PYTHON] Expose tableExists in pyspark.sql.catalog

2021-07-19 Thread GitBox
HyukjinKwon commented on pull request #33388: URL: https://github.com/apache/spark/pull/33388#issuecomment-882952393 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] SparkQA commented on pull request #33239: [SPARK-36030][SQL] Support DS v2 metrics at writing path

2021-07-19 Thread GitBox
SparkQA commented on pull request #33239: URL: https://github.com/apache/spark/pull/33239#issuecomment-882952191 **[Test build #141275 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141275/testReport)** for PR 33239 at commit

[GitHub] [spark] HyukjinKwon commented on pull request #33422: [SPARK-34806][SQL] Add Observation helper for Dataset.observe

2021-07-19 Thread GitBox
HyukjinKwon commented on pull request #33422: URL: https://github.com/apache/spark/pull/33422#issuecomment-882952069 cc @hvanhovell too -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon commented on a change in pull request #33422: [SPARK-34806][SQL] Add Observation helper for Dataset.observe

2021-07-19 Thread GitBox
HyukjinKwon commented on a change in pull request #33422: URL: https://github.com/apache/spark/pull/33422#discussion_r672721125 ## File path: sql/core/src/main/scala/org/apache/spark/sql/Observation.scala ## @@ -0,0 +1,150 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] HyukjinKwon commented on pull request #33422: [SPARK-34806][SQL] Add Observation helper for Dataset.observe

2021-07-19 Thread GitBox
HyukjinKwon commented on pull request #33422: URL: https://github.com/apache/spark/pull/33422#issuecomment-882951583 ok to test -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on a change in pull request #33399: [SPARK-36211][PYTHON] Correct typing of `udf` return value

2021-07-19 Thread GitBox
HyukjinKwon commented on a change in pull request #33399: URL: https://github.com/apache/spark/pull/33399#discussion_r672720815 ## File path: python/pyspark/sql/functions.pyi ## @@ -359,13 +360,13 @@ def variance(col: ColumnOrName) -> Column: ... @overload def udf( f:

[GitHub] [spark] HyukjinKwon commented on a change in pull request #33310: [SPARK-36105][SQL] OptimizeLocalShuffleReader support reading data of multiple mappers in one task

2021-07-19 Thread GitBox
HyukjinKwon commented on a change in pull request #33310: URL: https://github.com/apache/spark/pull/33310#discussion_r672720220 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ShuffledRowRDD.scala ## @@ -58,6 +58,12 @@ case class

[GitHub] [spark] HyukjinKwon commented on a change in pull request #33310: [SPARK-36105][SQL] OptimizeLocalShuffleReader support reading data of multiple mappers in one task

2021-07-19 Thread GitBox
HyukjinKwon commented on a change in pull request #33310: URL: https://github.com/apache/spark/pull/33310#discussion_r672720057 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeLocalShuffleReader.scala ## @@ -72,12 +72,23 @@ object

[GitHub] [spark] HyukjinKwon commented on a change in pull request #33310: [SPARK-36105][SQL] OptimizeLocalShuffleReader support reading data of multiple mappers in one task

2021-07-19 Thread GitBox
HyukjinKwon commented on a change in pull request #33310: URL: https://github.com/apache/spark/pull/33310#discussion_r672719894 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ShuffledRowRDD.scala ## @@ -58,6 +58,12 @@ case class

[GitHub] [spark] HyukjinKwon closed pull request #33398: [SPARK-36167][PYTHON][FOLLOWUP] Fix test failures with older versions of pandas

2021-07-19 Thread GitBox
HyukjinKwon closed pull request #33398: URL: https://github.com/apache/spark/pull/33398 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] HyukjinKwon commented on pull request #33398: [SPARK-36167][PYTHON][FOLLOWUP] Fix test failures with older versions of pandas

2021-07-19 Thread GitBox
HyukjinKwon commented on pull request #33398: URL: https://github.com/apache/spark/pull/33398#issuecomment-882949761 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] venkata91 commented on pull request #33426: [SPARK-32920][FOLLOW-UP] Fix shuffleMergeFinalized directly calling rdd.getNumPartitions as RDD is not serialized to executor

2021-07-19 Thread GitBox
venkata91 commented on pull request #33426: URL: https://github.com/apache/spark/pull/33426#issuecomment-882948773 cc @otterc @mridulm Please take a look. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] venkata91 opened a new pull request #33426: [SPARK-32920][FOLLOW-UP] Fix shuffleMergeFinalized directly calling rdd.getNumPartitions as RDD is not serialized to executor

2021-07-19 Thread GitBox
venkata91 opened a new pull request #33426: URL: https://github.com/apache/spark/pull/33426 ### What changes were proposed in this pull request? `ShuffleMapTask` should not push blocks if a shuffle is already merge finalized. Currently block push is disabled for retry

[GitHub] [spark] SparkQA commented on pull request #33078: [SPARK-35546][Shuffle] Enable push-based shuffle when multiple app attempts are enabled and manage concurrent access to the state in a better

2021-07-19 Thread GitBox
SparkQA commented on pull request #33078: URL: https://github.com/apache/spark/pull/33078#issuecomment-882948349 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45798/ -- This is an automated message from the

[GitHub] [spark] HyukjinKwon commented on pull request #33410: [WIP][SPARK-36204][INFRA][BUILD] Deduplicate Scala 2.13 daily build

2021-07-19 Thread GitBox
HyukjinKwon commented on pull request #33410: URL: https://github.com/apache/spark/pull/33410#issuecomment-882947075 Let me test the cron job in the master of my fork, and get back to you guys for doubly sure. -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] github-actions[bot] closed pull request #32101: [SPARK-35000] [spark-submit] Spark App in container will not exit when exception happen

2021-07-19 Thread GitBox
github-actions[bot] closed pull request #32101: URL: https://github.com/apache/spark/pull/32101 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] github-actions[bot] closed pull request #30841: [SPARK-28191][SS] New data source - state - reader part

2021-07-19 Thread GitBox
github-actions[bot] closed pull request #30841: URL: https://github.com/apache/spark/pull/30841 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33286: [SPARK-36079][SQL] Null-based filter estimate should always be in the range [0, 1]

2021-07-19 Thread GitBox
AmplabJenkins removed a comment on pull request #33286: URL: https://github.com/apache/spark/pull/33286#issuecomment-882938479 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45797/

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33350: [SPARK-36136][SQL][TESTS] Refactor PruneFileSourcePartitionsSuite etc to a different package

2021-07-19 Thread GitBox
AmplabJenkins removed a comment on pull request #33350: URL: https://github.com/apache/spark/pull/33350#issuecomment-882938480 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45796/

<    1   2   3   4   5   6   7   8   9   10   >