[GitHub] [spark] allisonwang-db commented on a change in pull request #32958: [SPARK-35065][SQL] Group exception messages in spark/sql (core)

2021-06-22 Thread GitBox
allisonwang-db commented on a change in pull request #32958: URL: https://github.com/apache/spark/pull/32958#discussion_r655930570 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala ## @@ -1647,4 +1643,300 @@ private[spark] objec

[GitHub] [spark] SparkQA commented on pull request #33009: [SPARK-35849][PYTHON] Make `astype` method data-type-based for DecimalOps

2021-06-22 Thread GitBox
SparkQA commented on pull request #33009: URL: https://github.com/apache/spark/pull/33009#issuecomment-865653720 **[Test build #140121 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140121/testReport)** for PR 33009 at commit [`3b5c507`](https://github.co

[GitHub] [spark] HeartSaVioR commented on pull request #32928: [WIP][SPARK-35784] Implementation for RocksDB instance

2021-06-22 Thread GitBox
HeartSaVioR commented on pull request #32928: URL: https://github.com/apache/spark/pull/32928#issuecomment-865653998 I'd rather say the version bump would be a no-go till we figure out either it is backward compatible, or quite easy way to migrate the old one to the new one. This applies t

[GitHub] [spark] SparkQA removed a comment on pull request #33009: [SPARK-35849][PYTHON] Make `astype` method data-type-based for DecimalOps

2021-06-22 Thread GitBox
SparkQA removed a comment on pull request #33009: URL: https://github.com/apache/spark/pull/33009#issuecomment-865636289 **[Test build #140121 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140121/testReport)** for PR 33009 at commit [`3b5c507`](https://gi

[GitHub] [spark] HeartSaVioR edited a comment on pull request #32928: [WIP][SPARK-35784] Implementation for RocksDB instance

2021-06-22 Thread GitBox
HeartSaVioR edited a comment on pull request #32928: URL: https://github.com/apache/spark/pull/32928#issuecomment-865653998 I'd rather say the version bump would be a no-go till we figure out either it is backward compatible, or quite easy way to migrate the old one to the new one. This ap

[GitHub] [spark] beliefer commented on a change in pull request #32958: [SPARK-35065][SQL] Group exception messages in spark/sql (core)

2021-06-22 Thread GitBox
beliefer commented on a change in pull request #32958: URL: https://github.com/apache/spark/pull/32958#discussion_r655934506 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala ## @@ -1647,4 +1643,300 @@ private[spark] object Quer

[GitHub] [spark] HeartSaVioR edited a comment on pull request #32928: [WIP][SPARK-35784] Implementation for RocksDB instance

2021-06-22 Thread GitBox
HeartSaVioR edited a comment on pull request #32928: URL: https://github.com/apache/spark/pull/32928#issuecomment-865653998 I'd rather say the version bump would be a no-go till we figure out either it is backward compatible (with RocksDB 6.2.2 as it has been used for production), or quite

[GitHub] [spark] sunchao commented on a change in pull request #33006: [SPARK-35846][SQL] Introduce ParquetReadState to track various states while reading a Parquet column chunk

2021-06-22 Thread GitBox
sunchao commented on a change in pull request #33006: URL: https://github.com/apache/spark/pull/33006#discussion_r655914360 ## File path: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java ## @@ -174,24 +162,29 @@ void readBat

[GitHub] [spark] sunchao commented on a change in pull request #33006: [SPARK-35846][SQL] Introduce ParquetReadState to track various states while reading a Parquet column chunk

2021-06-22 Thread GitBox
sunchao commented on a change in pull request #33006: URL: https://github.com/apache/spark/pull/33006#discussion_r655935518 ## File path: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java ## @@ -174,24 +162,29 @@ void readBat

[GitHub] [spark] SparkQA commented on pull request #33013: [SPARK-35799][SS] Fix the allUpdatesTimeMs metric measuring in FlatMapGroupsWithStateExec

2021-06-22 Thread GitBox
SparkQA commented on pull request #33013: URL: https://github.com/apache/spark/pull/33013#issuecomment-865660625 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44647/ -- This is an automated message from the Apache

[GitHub] [spark] SparkQA commented on pull request #32963: [SPARK-35378][SQL][FOLLOWUP] isLocal should consider CommandResult

2021-06-22 Thread GitBox
SparkQA commented on pull request #32963: URL: https://github.com/apache/spark/pull/32963#issuecomment-865660759 **[Test build #140115 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140115/testReport)** for PR 32963 at commit [`acc2b69`](https://github.co

[GitHub] [spark] SparkQA removed a comment on pull request #32963: [SPARK-35378][SQL][FOLLOWUP] isLocal should consider CommandResult

2021-06-22 Thread GitBox
SparkQA removed a comment on pull request #32963: URL: https://github.com/apache/spark/pull/32963#issuecomment-865484449 **[Test build #140115 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140115/testReport)** for PR 32963 at commit [`acc2b69`](https://gi

[GitHub] [spark] EnricoMi commented on a change in pull request #31905: [SPARK-34806][SQL] Add Observation helper for Dataset.observe

2021-06-22 Thread GitBox
EnricoMi commented on a change in pull request #31905: URL: https://github.com/apache/spark/pull/31905#discussion_r655942526 ## File path: sql/core/src/main/scala/org/apache/spark/sql/Observation.scala ## @@ -0,0 +1,170 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] [spark] SparkQA commented on pull request #32984: [SPARK-35726][SPARK-35769][SQL][FOLLOWUP] Call periodToMonths and durationToMicros in HiveResult should add endField

2021-06-22 Thread GitBox
SparkQA commented on pull request #32984: URL: https://github.com/apache/spark/pull/32984#issuecomment-865665380 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44649/ -- This is an automated message from the Apache

[GitHub] [spark] EnricoMi commented on a change in pull request #31905: [SPARK-34806][SQL] Add Observation helper for Dataset.observe

2021-06-22 Thread GitBox
EnricoMi commented on a change in pull request #31905: URL: https://github.com/apache/spark/pull/31905#discussion_r655944693 ## File path: sql/core/src/main/scala/org/apache/spark/sql/Observation.scala ## @@ -0,0 +1,139 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] [spark] SparkQA commented on pull request #33013: [SPARK-35799][SS] Fix the allUpdatesTimeMs metric measuring in FlatMapGroupsWithStateExec

2021-06-22 Thread GitBox
SparkQA commented on pull request #33013: URL: https://github.com/apache/spark/pull/33013#issuecomment-865665796 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44647/ -- This is an automated message from the A

[GitHub] [spark] SparkQA commented on pull request #32397: [SPARK-35084][CORE] Spark 3: supporting "--packages" in k8s cluster mode

2021-06-22 Thread GitBox
SparkQA commented on pull request #32397: URL: https://github.com/apache/spark/pull/32397#issuecomment-865666352 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44650/ -- This is an automated message from the Apache

[GitHub] [spark] EnricoMi commented on a change in pull request #31905: [SPARK-34806][SQL] Add Observation helper for Dataset.observe

2021-06-22 Thread GitBox
EnricoMi commented on a change in pull request #31905: URL: https://github.com/apache/spark/pull/31905#discussion_r655944693 ## File path: sql/core/src/main/scala/org/apache/spark/sql/Observation.scala ## @@ -0,0 +1,139 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] [spark] viirya commented on a change in pull request #32928: [WIP][SPARK-35784] Implementation for RocksDB instance

2021-06-22 Thread GitBox
viirya commented on a change in pull request #32928: URL: https://github.com/apache/spark/pull/32928#discussion_r655929725 ## File path: sql/core/src/main/scala/org/apache/spark/sql/util/RocksDBLoader.scala ## @@ -0,0 +1,61 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] AmplabJenkins commented on pull request #32218: [SPARK-35121][SQL] Enhance EliminateOuterJoin to eliminate outer joins if join condition is not defined

2021-06-22 Thread GitBox
AmplabJenkins commented on pull request #32218: URL: https://github.com/apache/spark/pull/32218#issuecomment-865668418 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140118/ -- This

[GitHub] [spark] AmplabJenkins commented on pull request #33013: [SPARK-35799][SS] Fix the allUpdatesTimeMs metric measuring in FlatMapGroupsWithStateExec

2021-06-22 Thread GitBox
AmplabJenkins commented on pull request #33013: URL: https://github.com/apache/spark/pull/33013#issuecomment-865668420 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44647/ -- T

[GitHub] [spark] AmplabJenkins commented on pull request #33009: [SPARK-35849][PYTHON] Make `astype` method data-type-based for DecimalOps

2021-06-22 Thread GitBox
AmplabJenkins commented on pull request #33009: URL: https://github.com/apache/spark/pull/33009#issuecomment-865668419 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140121/ -- This

[GitHub] [spark] AmplabJenkins commented on pull request #32963: [SPARK-35378][SQL][FOLLOWUP] isLocal should consider CommandResult

2021-06-22 Thread GitBox
AmplabJenkins commented on pull request #32963: URL: https://github.com/apache/spark/pull/32963#issuecomment-865668417 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140115/ -- This

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33013: [SPARK-35799][SS] Fix the allUpdatesTimeMs metric measuring in FlatMapGroupsWithStateExec

2021-06-22 Thread GitBox
AmplabJenkins removed a comment on pull request #33013: URL: https://github.com/apache/spark/pull/33013#issuecomment-865668420 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44647/

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32218: [SPARK-35121][SQL] Enhance EliminateOuterJoin to eliminate outer joins if join condition is not defined

2021-06-22 Thread GitBox
AmplabJenkins removed a comment on pull request #32218: URL: https://github.com/apache/spark/pull/32218#issuecomment-865668418 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140118/ -

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32963: [SPARK-35378][SQL][FOLLOWUP] isLocal should consider CommandResult

2021-06-22 Thread GitBox
AmplabJenkins removed a comment on pull request #32963: URL: https://github.com/apache/spark/pull/32963#issuecomment-865668417 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140115/ -

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33009: [SPARK-35849][PYTHON] Make `astype` method data-type-based for DecimalOps

2021-06-22 Thread GitBox
AmplabJenkins removed a comment on pull request #33009: URL: https://github.com/apache/spark/pull/33009#issuecomment-865668419 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140121/ -

[GitHub] [spark] SparkQA commented on pull request #32958: [SPARK-35065][SQL] Group exception messages in spark/sql (core)

2021-06-22 Thread GitBox
SparkQA commented on pull request #32958: URL: https://github.com/apache/spark/pull/32958#issuecomment-865668929 **[Test build #140125 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140125/testReport)** for PR 32958 at commit [`f87e24d`](https://github.com

[GitHub] [spark] SparkQA commented on pull request #33006: [SPARK-35846][SQL] Introduce ParquetReadState to track various states while reading a Parquet column chunk

2021-06-22 Thread GitBox
SparkQA commented on pull request #33006: URL: https://github.com/apache/spark/pull/33006#issuecomment-865668910 **[Test build #140124 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140124/testReport)** for PR 33006 at commit [`e11d20e`](https://github.com

[GitHub] [spark] SparkQA commented on pull request #33009: [SPARK-35849][PYTHON] Make `astype` method data-type-based for DecimalOps

2021-06-22 Thread GitBox
SparkQA commented on pull request #33009: URL: https://github.com/apache/spark/pull/33009#issuecomment-865669511 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44648/ -- This is an automated message from the Apache

[GitHub] [spark] cloud-fan commented on a change in pull request #33002: [SPARK-35843][SQL] Unify the file name between batch and streaming file writers

2021-06-22 Thread GitBox
cloud-fan commented on a change in pull request #33002: URL: https://github.com/apache/spark/pull/33002#discussion_r655950782 ## File path: core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala ## @@ -152,12 +153,21 @@ class HadoopMapReduceCommit

[GitHub] [spark] SparkQA commented on pull request #32984: [SPARK-35726][SPARK-35769][SQL][FOLLOWUP] Call periodToMonths and durationToMicros in HiveResult should add endField

2021-06-22 Thread GitBox
SparkQA commented on pull request #32984: URL: https://github.com/apache/spark/pull/32984#issuecomment-865670900 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44649/ -- This is an automated message from the A

[GitHub] [spark] AmplabJenkins commented on pull request #32984: [SPARK-35726][SPARK-35769][SQL][FOLLOWUP] Call periodToMonths and durationToMicros in HiveResult should add endField

2021-06-22 Thread GitBox
AmplabJenkins commented on pull request #32984: URL: https://github.com/apache/spark/pull/32984#issuecomment-865670932 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44649/ -- T

[GitHub] [spark] ulysses-you opened a new pull request #33015: [SPARK-35853][SQL] Remark the shuffle origin to ENSURE_REQUIREMENTS as far as possible

2021-06-22 Thread GitBox
ulysses-you opened a new pull request #33015: URL: https://github.com/apache/spark/pull/33015 ### What changes were proposed in this pull request? Add a rule `RemarkShuffleOrigin` in AQE queryStagePreparationRules after EnsureRequirements. ### Why are the changes n

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32984: [SPARK-35726][SPARK-35769][SQL][FOLLOWUP] Call periodToMonths and durationToMicros in HiveResult should add endField

2021-06-22 Thread GitBox
AmplabJenkins removed a comment on pull request #32984: URL: https://github.com/apache/spark/pull/32984#issuecomment-865670932 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44649/

[GitHub] [spark] SparkQA commented on pull request #33015: [SPARK-35853][SQL] Remark the shuffle origin to ENSURE_REQUIREMENTS as far as possible

2021-06-22 Thread GitBox
SparkQA commented on pull request #33015: URL: https://github.com/apache/spark/pull/33015#issuecomment-865671725 **[Test build #140126 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140126/testReport)** for PR 33015 at commit [`f1beaf0`](https://github.com

[GitHub] [spark] jerqi opened a new pull request #33016: [SPARK-35318][SQL][FOLLOWUP] Hide internal view properties for

2021-06-22 Thread GitBox
jerqi opened a new pull request #33016: URL: https://github.com/apache/spark/pull/33016 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was t

[GitHub] [spark] SparkQA commented on pull request #32397: [SPARK-35084][CORE] Spark 3: supporting "--packages" in k8s cluster mode

2021-06-22 Thread GitBox
SparkQA commented on pull request #32397: URL: https://github.com/apache/spark/pull/32397#issuecomment-865672646 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44650/ -- This is an automated message from the A

[GitHub] [spark] AmplabJenkins commented on pull request #32397: [SPARK-35084][CORE] Spark 3: supporting "--packages" in k8s cluster mode

2021-06-22 Thread GitBox
AmplabJenkins commented on pull request #32397: URL: https://github.com/apache/spark/pull/32397#issuecomment-865672677 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44650/ -- T

[GitHub] [spark] AmplabJenkins commented on pull request #33016: [WIP][SPARK-35318][SQL][FOLLOWUP] Hide internal view properties for

2021-06-22 Thread GitBox
AmplabJenkins commented on pull request #33016: URL: https://github.com/apache/spark/pull/33016#issuecomment-865672658 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32397: [SPARK-35084][CORE] Spark 3: supporting "--packages" in k8s cluster mode

2021-06-22 Thread GitBox
AmplabJenkins removed a comment on pull request #32397: URL: https://github.com/apache/spark/pull/32397#issuecomment-865672677 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44650/

[GitHub] [spark] HeartSaVioR closed pull request #32952: [SPARK-35799][SS] Fix the allUpdatesTimeMs metric measuring in FlatMapGroupsWithStateExec

2021-06-22 Thread GitBox
HeartSaVioR closed pull request #32952: URL: https://github.com/apache/spark/pull/32952 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, pl

[GitHub] [spark] viirya closed pull request #32747: [SPARK-35611][SS] Introduce the strategy on mismatched offset for start offset timestamp on Kafka data source

2021-06-22 Thread GitBox
viirya closed pull request #32747: URL: https://github.com/apache/spark/pull/32747 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please

[GitHub] [spark] attilapiros commented on pull request #32790: [SPARK-35543][CORE] Fix memory leak in BlockManagerMasterEndpoint removeRdd

2021-06-22 Thread GitBox
attilapiros commented on pull request #32790: URL: https://github.com/apache/spark/pull/32790#issuecomment-865063482 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For que

[GitHub] [spark] dgd-contributor commented on pull request #32916: [SPARK-35064][SQL] Group error in spark-catalyst

2021-06-22 Thread GitBox
dgd-contributor commented on pull request #32916: URL: https://github.com/apache/spark/pull/32916#issuecomment-865530485 @beliefer does sql-slow-test module corrupted? I couldn't know why it keep failing? -- This is an automated message from the Apache Git Service. To respond to the mess

[GitHub] [spark] SparkQA commented on pull request #32810: [SPARK-35672][CORE][YARN] Pass user classpath entries to executors using config instead of command line.

2021-06-22 Thread GitBox
SparkQA commented on pull request #32810: URL: https://github.com/apache/spark/pull/32810#issuecomment-865040773 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries

[GitHub] [spark] beliefer commented on pull request #32958: [SPARK-35065][SQL] Group exception messages in spark/sql (core)

2021-06-22 Thread GitBox
beliefer commented on pull request #32958: URL: https://github.com/apache/spark/pull/32958#issuecomment-865486368 Gentle ping @allisonwang-db -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] cloud-fan commented on a change in pull request #32999: [SPARK-35727][SQL] Return INTERVAL DAY from dates subtraction

2021-06-22 Thread GitBox
cloud-fan commented on a change in pull request #32999: URL: https://github.com/apache/spark/pull/32999#discussion_r655546772 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -349,8 +349,18 @@ class Analyzer(override val cat

[GitHub] [spark] SparkQA removed a comment on pull request #32994: [SPARK-35838][TESTS] Ensure kafka-0-10-sql module can be maven test independently in Scala 2.13

2021-06-22 Thread GitBox
SparkQA removed a comment on pull request #32994: URL: https://github.com/apache/spark/pull/32994#issuecomment-864796606 **[Test build #140063 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140063/testReport)** for PR 32994 at commit [`2fb4d36`](https://gi

[GitHub] [spark] HyukjinKwon commented on a change in pull request #32998: [SPARK-35842][INFRA] Ignore all .idea folders

2021-06-22 Thread GitBox
HyukjinKwon commented on a change in pull request #32998: URL: https://github.com/apache/spark/pull/32998#discussion_r655242740 ## File path: .gitignore ## @@ -15,8 +15,8 @@ .ensime_cache/ .ensime_lucene .generated-mima* -# The star is required for further !.idea/ to work, s

[GitHub] [spark] MaxGekk closed pull request #32997: [SPARK-35840][SQL] Add `apply()` for a single field to `YearMonthIntervalType` and `DayTimeIntervalType`.

2021-06-22 Thread GitBox
MaxGekk closed pull request #32997: URL: https://github.com/apache/spark/pull/32997 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please

[GitHub] [spark] gengliangwang commented on pull request #33003: [SPARK-35844][INFRA] Add hadoop-cloud profile to PUBLISH_PROFILES

2021-06-22 Thread GitBox
gengliangwang commented on pull request #33003: URL: https://github.com/apache/spark/pull/33003#issuecomment-865468105 Late LGTM, thank you @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] srowen commented on pull request #32813: [SPARK-34591][MLLIB][WIP] Add decision tree pruning as a parameter

2021-06-22 Thread GitBox
srowen commented on pull request #32813: URL: https://github.com/apache/spark/pull/32813#issuecomment-865078252 I think you could build the project with your change and just run bin/pyspark to check out the API. `python/run-tests` is the real standard for whether the tests pass and Jenkins

[GitHub] [spark] viirya commented on a change in pull request #32980: [SPARK-35829][SQL] Clean up evaluates subexpressions and add more flexibility to evaluate particular subexpressoin

2021-06-22 Thread GitBox
viirya commented on a change in pull request #32980: URL: https://github.com/apache/spark/pull/32980#discussion_r655164773 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ## @@ -76,24 +76,35 @@ object ExprCode {

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32810: [SPARK-35672][CORE][YARN] Pass user classpath entries to executors using config instead of command line.

2021-06-22 Thread GitBox
AmplabJenkins removed a comment on pull request #32810: URL: https://github.com/apache/spark/pull/32810#issuecomment-865082438 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32397: [SPARK-35084][CORE] Spark 3: supporting "--packages" in k8s cluster mode

2021-06-22 Thread GitBox
AmplabJenkins removed a comment on pull request #32397: URL: https://github.com/apache/spark/pull/32397#issuecomment-865672677 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44650/

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32298: [SPARK-34079][SQL] Merge non-correlated scalar subqueries for better reuse

2021-06-22 Thread GitBox
AmplabJenkins removed a comment on pull request #32298: URL: https://github.com/apache/spark/pull/32298#issuecomment-864955350 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[GitHub] [spark] allisonwang-db commented on a change in pull request #32958: [SPARK-35065][SQL] Group exception messages in spark/sql (core)

2021-06-22 Thread GitBox
allisonwang-db commented on a change in pull request #32958: URL: https://github.com/apache/spark/pull/32958#discussion_r655928725 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala ## @@ -1485,12 +1485,8 @@ private[spark] object

[GitHub] [spark] SparkQA commented on pull request #32980: [SPARK-35829][SQL] Clean up evaluates subexpressions and add more flexibility to evaluate particular subexpressoin

2021-06-22 Thread GitBox
SparkQA commented on pull request #32980: URL: https://github.com/apache/spark/pull/32980#issuecomment-864844320 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries

[GitHub] [spark] sarutak edited a comment on pull request #32997: [SPARK-35840][SQL] Add `apply()` for a single field to `YearMonthIntervalType` and `DayTimeIntervalType`.

2021-06-22 Thread GitBox
sarutak edited a comment on pull request #32997: URL: https://github.com/apache/spark/pull/32997#issuecomment-864858848 Thank you @MaxGekk. It seems useful. Maybe, it's better to merge this change before #32988. After this change is merged, I'll rebase to master in #32988. -- This i

[GitHub] [spark] AmplabJenkins commented on pull request #32286: [SPARK-35181][CORE] Use zstd for spark.io.compression.codec by default

2021-06-22 Thread GitBox
AmplabJenkins commented on pull request #32286: URL: https://github.com/apache/spark/pull/32286#issuecomment-865173591 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For q

[GitHub] [spark] ueshin closed pull request #32910: [SPARK-35614][PYTHON] Make the conversion to pandas data-type-based for ExtensionDtypes

2021-06-22 Thread GitBox
ueshin closed pull request #32910: URL: https://github.com/apache/spark/pull/32910 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please

[GitHub] [spark] gengliangwang commented on pull request #32983: [SPARK-35831][YARN][test-maven] Handle PathOperationException in copyFileToRemote on the same src and dest

2021-06-22 Thread GitBox
gengliangwang commented on pull request #32983: URL: https://github.com/apache/spark/pull/32983#issuecomment-865125747 Merging to master to fix the test builds. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] otterc commented on pull request #32992: [SPARK-35836][SHUFFLE][CORE] Removed the reference to spark.shuffle.push.based.enabled in ShuffleBlockPusherSuite

2021-06-22 Thread GitBox
otterc commented on pull request #32992: URL: https://github.com/apache/spark/pull/32992#issuecomment-864763591 @mridulm Could you please help review this trivial change -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] peter-toth commented on a change in pull request #28885: [SPARK-29375][SPARK-28940][SPARK-32041][SQL] Whole plan exchange and subquery reuse

2021-06-22 Thread GitBox
peter-toth commented on a change in pull request #28885: URL: https://github.com/apache/spark/pull/28885#discussion_r655142981 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/reuse/ReuseExchangeAndSubquery.scala ## @@ -0,0 +1,66 @@ +/* + * Licensed to the

[GitHub] [spark] SparkQA commented on pull request #32997: [SPARK-35840][SQL] Add `apply()` for a single field to `YearMonthIntervalType` and `DayTimeIntervalType`.

2021-06-22 Thread GitBox
SparkQA commented on pull request #32997: URL: https://github.com/apache/spark/pull/32997#issuecomment-864881629 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries

[GitHub] [spark] AmplabJenkins commented on pull request #32990: [SPARK-35545][FOLLOW-UP][TEST][SQL] Add a regression test for the SubqueryExpression refactor

2021-06-22 Thread GitBox
AmplabJenkins commented on pull request #32990: URL: https://github.com/apache/spark/pull/32990#issuecomment-864794503 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For q

[GitHub] [spark] SparkQA removed a comment on pull request #32952: [SPARK-35799][SS] Fix the allUpdatesTimeMs metric measuring in FlatMapGroupsWithStateExec

2021-06-22 Thread GitBox
SparkQA removed a comment on pull request #32952: URL: https://github.com/apache/spark/pull/32952#issuecomment-865433225 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For

[GitHub] [spark] MaxGekk commented on pull request #32984: [SPARK-35726][SPARK-35769][SQL][FOLLOWUP] Call periodToMonths and durationToMicros in HiveResult should add endField

2021-06-22 Thread GitBox
MaxGekk commented on pull request #32984: URL: https://github.com/apache/spark/pull/32984#issuecomment-865621423 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries

[GitHub] [spark] SparkQA commented on pull request #33004: [SPARK-35845][SQL] OuterReference resolution should reject ambiguous column names

2021-06-22 Thread GitBox
SparkQA commented on pull request #33004: URL: https://github.com/apache/spark/pull/33004#issuecomment-865256299 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries

[GitHub] [spark] SparkQA removed a comment on pull request #32963: [SPARK-35378][SQL][FOLLOWUP] isLocal should consider CommandResult

2021-06-22 Thread GitBox
SparkQA removed a comment on pull request #32963: URL: https://github.com/apache/spark/pull/32963#issuecomment-865484449 **[Test build #140115 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140115/testReport)** for PR 32963 at commit [`acc2b69`](https://gi

[GitHub] [spark] PavithraRamachandran commented on pull request #32996: [SPARK-35835][SQL] Select filter query with struct complex type should be case insensitive

2021-06-22 Thread GitBox
PavithraRamachandran commented on pull request #32996: URL: https://github.com/apache/spark/pull/32996#issuecomment-865170943 ok i ll check and raise in master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [spark] AmplabJenkins commented on pull request #33016: [WIP][SPARK-35318][SQL][FOLLOWUP] Hide internal view properties for

2021-06-22 Thread GitBox
AmplabJenkins commented on pull request #33016: URL: https://github.com/apache/spark/pull/33016#issuecomment-865672658 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [spark] AmplabJenkins commented on pull request #32850: [SPARK-34920][CORE][SQL] Add error classes with SQLSTATE

2021-06-22 Thread GitBox
AmplabJenkins commented on pull request #32850: URL: https://github.com/apache/spark/pull/32850#issuecomment-865321108 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For q

[GitHub] [spark] AngersZhuuuu commented on pull request #32984: [SPARK-35726][SPARK-35769][SQL][FOLLOWUP] Call periodToMonths and durationToMicros in HiveResult should add endField

2021-06-22 Thread GitBox
AngersZh commented on pull request #32984: URL: https://github.com/apache/spark/pull/32984#issuecomment-865493215 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For qu

[GitHub] [spark] cloud-fan commented on pull request #32995: [SPARK-35839][SQL] New SQL function: to_timestamp_ntz

2021-06-22 Thread GitBox
cloud-fan commented on pull request #32995: URL: https://github.com/apache/spark/pull/32995#issuecomment-865046653 2 high-level comments: 1. I think this new function should just go with the non-legacy behavior if the legacy conf is enabled, instead of failing. 2. We can make `ToTimes

[GitHub] [spark] cloud-fan commented on a change in pull request #23608: [SPARK-26682][SQL] Use taskAttemptID instead of attemptNumber for Hadoop.

2021-06-22 Thread GitBox
cloud-fan commented on a change in pull request #23608: URL: https://github.com/apache/spark/pull/23608#discussion_r655446703 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala ## @@ -170,7 +170,7 @@ object FileFormatWriter

[GitHub] [spark] AmplabJenkins commented on pull request #32810: [SPARK-35672][CORE][YARN] Pass user classpath entries to executors using config instead of command line.

2021-06-22 Thread GitBox
AmplabJenkins commented on pull request #32810: URL: https://github.com/apache/spark/pull/32810#issuecomment-865082438 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For q

[GitHub] [spark] AmplabJenkins commented on pull request #32975: [SPARK-35820][SQL] Support Cast between different field DayTimeIntervalType

2021-06-22 Thread GitBox
AmplabJenkins commented on pull request #32975: URL: https://github.com/apache/spark/pull/32975#issuecomment-864761630 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For q

[GitHub] [spark] otterc commented on a change in pull request #32992: [SPARK-35836][SHUFFLE][CORE] Removed the reference to spark.shuffle.push.based.enabled in ShuffleBlockPusherSuite

2021-06-22 Thread GitBox
otterc commented on a change in pull request #32992: URL: https://github.com/apache/spark/pull/32992#discussion_r655502493 ## File path: core/src/test/scala/org/apache/spark/shuffle/ShuffleBlockPusherSuite.scala ## @@ -56,8 +56,6 @@ class ShuffleBlockPusherSuite extends SparkF

[GitHub] [spark] SparkQA commented on pull request #32993: [SPARK-35776][SQL] Check all year-month interval types in arrow

2021-06-22 Thread GitBox
SparkQA commented on pull request #32993: URL: https://github.com/apache/spark/pull/32993#issuecomment-864770123 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32975: [SPARK-35820][SQL] Support Cast between different field DayTimeIntervalType

2021-06-22 Thread GitBox
AmplabJenkins removed a comment on pull request #32975: URL: https://github.com/apache/spark/pull/32975#issuecomment-864761630 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[GitHub] [spark] SparkQA removed a comment on pull request #32928: [WIP][SPARK-35784] Implementation for RocksDB instance

2021-06-22 Thread GitBox
SparkQA removed a comment on pull request #32928: URL: https://github.com/apache/spark/pull/32928#issuecomment-865176623 **[Test build #140087 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140087/testReport)** for PR 32928 at commit [`80154b2`](https://gi

[GitHub] [spark] MaxGekk commented on pull request #32995: [SPARK-35839][SQL] New SQL function: to_timestamp_ntz

2021-06-22 Thread GitBox
MaxGekk commented on pull request #32995: URL: https://github.com/apache/spark/pull/32995#issuecomment-864818572 @gengliangwang Thanks for the ping. I will review it today slightly later. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] AmplabJenkins commented on pull request #33013: [SPARK-35799][SS] Fix the allUpdatesTimeMs metric measuring in FlatMapGroupsWithStateExec

2021-06-22 Thread GitBox
AmplabJenkins commented on pull request #33013: URL: https://github.com/apache/spark/pull/33013#issuecomment-865668420 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44647/ -- T

[GitHub] [spark] AmplabJenkins commented on pull request #32999: [SPARK-35727][SQL] Return INTERVAL DAY from dates subtraction

2021-06-22 Thread GitBox
AmplabJenkins commented on pull request #32999: URL: https://github.com/apache/spark/pull/32999#issuecomment-864920372 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [spark] SparkQA removed a comment on pull request #32998: [SPARK-35842][INFRA] Ignore all .idea folders

2021-06-22 Thread GitBox
SparkQA removed a comment on pull request #32998: URL: https://github.com/apache/spark/pull/32998#issuecomment-864890262 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For

[GitHub] [spark] xkrogen commented on a change in pull request #32969: [SPARK-35817][SQL] Restore performance of queries against wide Avro tables

2021-06-22 Thread GitBox
xkrogen commented on a change in pull request #32969: URL: https://github.com/apache/spark/pull/32969#discussion_r655548950 ## File path: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroUtils.scala ## @@ -202,34 +203,40 @@ private[sql] object AvroUtils extends Loggi

[GitHub] [spark] Yikun commented on pull request #33009: [SPARK-35849][PYTHON] Make `astype` method data-type-based for DecimalOps

2021-06-22 Thread GitBox
Yikun commented on pull request #33009: URL: https://github.com/apache/spark/pull/33009#issuecomment-865484219 cc @xinrong-databricks @ueshin -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32997: [SPARK-35840][SQL] Add `apply()` for a single field to `YearMonthIntervalType` and `DayTimeIntervalType`.

2021-06-22 Thread GitBox
AmplabJenkins removed a comment on pull request #32997: URL: https://github.com/apache/spark/pull/32997#issuecomment-864955342 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[GitHub] [spark] AngersZhuuuu commented on pull request #32993: [SPARK-35776][SQL] Check all year-month interval types in arrow

2021-06-22 Thread GitBox
AngersZh commented on pull request #32993: URL: https://github.com/apache/spark/pull/32993#issuecomment-864770320 @MaxGekk Since ArrowType.Interval only support YEAR_TO_MONTH and DAY_TO_SECOND so all passed in will read out as YearMonthIntervalType -- This is an automated message fr

[GitHub] [spark] SparkQA commented on pull request #32952: [SPARK-35799][SS] Fix the allUpdatesTimeMs metric measuring in FlatMapGroupsWithStateExec

2021-06-22 Thread GitBox
SparkQA commented on pull request #32952: URL: https://github.com/apache/spark/pull/32952#issuecomment-865433225 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries

[GitHub] [spark] SparkQA removed a comment on pull request #32810: [SPARK-35672][CORE][YARN] Pass user classpath entries to executors using config instead of command line.

2021-06-22 Thread GitBox
SparkQA removed a comment on pull request #32810: URL: https://github.com/apache/spark/pull/32810#issuecomment-865040773 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For

[GitHub] [spark] viirya commented on a change in pull request #32767: [SPARK-35628][SS] RocksDBFileManager - load checkpoint from DFS

2021-06-22 Thread GitBox
viirya commented on a change in pull request #32767: URL: https://github.com/apache/spark/pull/32767#discussion_r655924518 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala ## @@ -200,6 +246,55 @@ class RocksDBFileMan

[GitHub] [spark] AmplabJenkins commented on pull request #33005: [SPARK-35847][PYTHON] Manage InternalField in DataTypeOps.isnull

2021-06-22 Thread GitBox
AmplabJenkins commented on pull request #33005: URL: https://github.com/apache/spark/pull/33005#issuecomment-865409032 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For q

[GitHub] [spark] mridulm commented on pull request #32790: [SPARK-35543][CORE] Fix memory leak in BlockManagerMasterEndpoint removeRdd

2021-06-22 Thread GitBox
mridulm commented on pull request #32790: URL: https://github.com/apache/spark/pull/32790#issuecomment-865263199 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries

[GitHub] [spark] cloud-fan commented on pull request #33004: [SPARK-35845][SQL] OuterReference resolution should reject ambiguous column names

2021-06-22 Thread GitBox
cloud-fan commented on pull request #33004: URL: https://github.com/apache/spark/pull/33004#issuecomment-865255792 cc @allisonwang-db @maropu -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33000: [SPARK-35778][SQL] Check multiply/divide of year-month intervals of any fields by numeric

2021-06-22 Thread GitBox
AmplabJenkins removed a comment on pull request #33000: URL: https://github.com/apache/spark/pull/33000#issuecomment-864923851 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[GitHub] [spark] ueshin commented on a change in pull request #33009: [SPARK-35849][PYTHON] Make `astype` method data-type-based for DecimalOps

2021-06-22 Thread GitBox
ueshin commented on a change in pull request #33009: URL: https://github.com/apache/spark/pull/33009#discussion_r655836478 ## File path: python/pyspark/pandas/data_type_ops/num_ops.py ## @@ -416,6 +409,29 @@ def pretty_name(self) -> str: def isnull(self, index_ops: Union["

[GitHub] [spark] dongjoon-hyun closed pull request #33003: [SPARK-35844][INFRA] Add hadoop-cloud profile to PUBLISH_PROFILES

2021-06-22 Thread GitBox
dongjoon-hyun closed pull request #33003: URL: https://github.com/apache/spark/pull/33003 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service,

  1   2   3   4   5   6   7   8   9   10   >