[GitHub] [spark] ekoifman commented on a change in pull request #33641: [SPARK-36416][SQL] Add SQL metrics to AdaptiveSparkPlanExec for BHJs and Skew joins

2021-08-16 Thread GitBox
ekoifman commented on a change in pull request #33641: URL: https://github.com/apache/spark/pull/33641#discussion_r689693297 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala ## @@ -79,6 +81,13 @@ case class

[GitHub] [spark] xinrong-databricks commented on a change in pull request #33694: [SPARK-36469][PYTHON] Implement Index.map

2021-08-16 Thread GitBox
xinrong-databricks commented on a change in pull request #33694: URL: https://github.com/apache/spark/pull/33694#discussion_r689690796 ## File path: python/pyspark/pandas/indexes/datetimes.py ## @@ -741,6 +741,13 @@ def pandas_at_time(pdf) -> ps.DataFrame[int]:

[GitHub] [spark] xinrong-databricks commented on a change in pull request #33694: [SPARK-36469][PYTHON] Implement Index.map

2021-08-16 Thread GitBox
xinrong-databricks commented on a change in pull request #33694: URL: https://github.com/apache/spark/pull/33694#discussion_r689690796 ## File path: python/pyspark/pandas/indexes/datetimes.py ## @@ -741,6 +741,13 @@ def pandas_at_time(pdf) -> ps.DataFrame[int]:

[GitHub] [spark] xinrong-databricks commented on a change in pull request #33694: [SPARK-36469][PYTHON] Implement Index.map

2021-08-16 Thread GitBox
xinrong-databricks commented on a change in pull request #33694: URL: https://github.com/apache/spark/pull/33694#discussion_r689690796 ## File path: python/pyspark/pandas/indexes/datetimes.py ## @@ -741,6 +741,13 @@ def pandas_at_time(pdf) -> ps.DataFrame[int]:

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33748: [SPARK-36516][SQL] Add File Metadata cache support for Orc

2021-08-16 Thread GitBox
AmplabJenkins removed a comment on pull request #33748: URL: https://github.com/apache/spark/pull/33748#issuecomment-899651363 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47015/

[GitHub] [spark] LuciferYang commented on pull request #33748: [SPARK-36516][SQL] Add File Metadata cache support for Orc

2021-08-16 Thread GitBox
LuciferYang commented on pull request #33748: URL: https://github.com/apache/spark/pull/33748#issuecomment-899651911 `FileMetaCacheManager` currently holds a cache singleton, which is not easy to test the scenario of cache eviction. Do we need to check these? @dongjoon-hyun

[GitHub] [spark] AmplabJenkins commented on pull request #33748: [SPARK-36516][SQL] Add File Metadata cache support for Orc

2021-08-16 Thread GitBox
AmplabJenkins commented on pull request #33748: URL: https://github.com/apache/spark/pull/33748#issuecomment-899651363 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47015/ --

[GitHub] [spark] SparkQA commented on pull request #33748: [SPARK-36516][SQL] Add File Metadata cache support for Orc

2021-08-16 Thread GitBox
SparkQA commented on pull request #33748: URL: https://github.com/apache/spark/pull/33748#issuecomment-899651320 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47015/ --

[GitHub] [spark] LuciferYang commented on pull request #33748: [SPARK-36516][SQL] Add File Metadata cache support for Orc

2021-08-16 Thread GitBox
LuciferYang commented on pull request #33748: URL: https://github.com/apache/spark/pull/33748#issuecomment-899650191 @dongjoon-hyun 95bae3c add a configurable `maximumSize` and the default value is 1000. Please help to check whether this configuration is reasonable. -- This is an

[GitHub] [spark] SparkQA commented on pull request #33748: [SPARK-36516][SQL] Add File Metadata cache support for Orc

2021-08-16 Thread GitBox
SparkQA commented on pull request #33748: URL: https://github.com/apache/spark/pull/33748#issuecomment-899647543 **[Test build #142516 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142516/testReport)** for PR 33748 at commit

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33748: [SPARK-36516][SQL] Add File Metadata cache support for Orc

2021-08-16 Thread GitBox
AmplabJenkins removed a comment on pull request #33748: URL: https://github.com/apache/spark/pull/33748#issuecomment-899613087 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142497/

[GitHub] [spark] MaxGekk commented on pull request #33753: [SPARK-36524][SQL] Common class for ANSI interval types

2021-08-16 Thread GitBox
MaxGekk commented on pull request #33753: URL: https://github.com/apache/spark/pull/33753#issuecomment-899643740 @gengliangwang @sarutak @cloud-fan @AngersZh @beliefer @Peng-Lei Could you review this PR, please. -- This is an automated message from the Apache Git Service. To respond

[GitHub] [spark] SparkQA commented on pull request #33748: [SPARK-36516][SQL] Add File Metadata cache support for Orc

2021-08-16 Thread GitBox
SparkQA commented on pull request #33748: URL: https://github.com/apache/spark/pull/33748#issuecomment-899643515 **[Test build #142515 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142515/testReport)** for PR 33748 at commit

[GitHub] [spark] SparkQA commented on pull request #33753: [SPARK-36524][SQL] Common class for ANSI interval types

2021-08-16 Thread GitBox
SparkQA commented on pull request #33753: URL: https://github.com/apache/spark/pull/33753#issuecomment-899643378 **[Test build #142514 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142514/testReport)** for PR 33753 at commit

[GitHub] [spark] LuciferYang commented on a change in pull request #33748: [SPARK-36516][SQL] Add File Metadata cache support for Orc

2021-08-16 Thread GitBox
LuciferYang commented on a change in pull request #33748: URL: https://github.com/apache/spark/pull/33748#discussion_r689680487 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcPartitionReaderFactory.scala ## @@ -158,6 +169,10 @@ case

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33750: [SPARK-36521][SQL] Disallow comparison between Interval and String

2021-08-16 Thread GitBox
AmplabJenkins removed a comment on pull request #33750: URL: https://github.com/apache/spark/pull/33750#issuecomment-899641878 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47013/

[GitHub] [spark] AmplabJenkins commented on pull request #33750: [SPARK-36521][SQL] Disallow comparison between Interval and String

2021-08-16 Thread GitBox
AmplabJenkins commented on pull request #33750: URL: https://github.com/apache/spark/pull/33750#issuecomment-899641878 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47013/ --

[GitHub] [spark] MaxGekk opened a new pull request #33753: [SPARK-36524][SQL] Common class for ANSI interval types

2021-08-16 Thread GitBox
MaxGekk opened a new pull request #33753: URL: https://github.com/apache/spark/pull/33753 ### What changes were proposed in this pull request? Add new type `AnsiIntervalType` to `AbstractDataType.scala`, and extend it by `YearMonthIntervalType` and by `DayTimeIntervalType` ###

[GitHub] [spark] dongjoon-hyun closed pull request #33721: [SPARK-32210][CORE] Fix NegativeArraySizeException in MapOutputTracker with large spark.default.parallelism

2021-08-16 Thread GitBox
dongjoon-hyun closed pull request #33721: URL: https://github.com/apache/spark/pull/33721 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] SparkQA commented on pull request #33750: [SPARK-36521][SQL] Disallow comparison between Interval and String

2021-08-16 Thread GitBox
SparkQA commented on pull request #33750: URL: https://github.com/apache/spark/pull/33750#issuecomment-899634134 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47013/ -- This is an automated message from the

[GitHub] [spark] SparkQA commented on pull request #33748: [SPARK-36516][SQL] Add File Metadata cache support for Orc

2021-08-16 Thread GitBox
SparkQA commented on pull request #33748: URL: https://github.com/apache/spark/pull/33748#issuecomment-899630916 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47014/ -- This is an automated message from the Apache

[GitHub] [spark] LuciferYang commented on a change in pull request #33748: [SPARK-36516][SQL] Add File Metadata cache support for Orc

2021-08-16 Thread GitBox
LuciferYang commented on a change in pull request #33748: URL: https://github.com/apache/spark/pull/33748#discussion_r689665023 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileMetaCacheManager.scala ## @@ -0,0 +1,87 @@ +/* + * Licensed to

[GitHub] [spark] LuciferYang commented on a change in pull request #33748: [SPARK-36516][SQL] Add File Metadata cache support for Orc

2021-08-16 Thread GitBox
LuciferYang commented on a change in pull request #33748: URL: https://github.com/apache/spark/pull/33748#discussion_r689660575 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala ## @@ -742,6 +742,43 @@ abstract class

[GitHub] [spark] gengliangwang commented on pull request #33751: [SPARK-36522][SQL] Casting invalid string to year-month interval should return null

2021-08-16 Thread GitBox
gengliangwang commented on pull request #33751: URL: https://github.com/apache/spark/pull/33751#issuecomment-899621243 Got it. I am closing this PR and related JIRAs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] gengliangwang closed pull request #33751: [SPARK-36522][SQL] Casting invalid string to year-month interval should return null

2021-08-16 Thread GitBox
gengliangwang closed pull request #33751: URL: https://github.com/apache/spark/pull/33751 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] LuciferYang commented on a change in pull request #33748: [SPARK-36516][SQL] Add File Metadata cache support for Orc

2021-08-16 Thread GitBox
LuciferYang commented on a change in pull request #33748: URL: https://github.com/apache/spark/pull/33748#discussion_r689658494 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileMetaCacheManager.scala ## @@ -0,0 +1,90 @@ +/* + * Licensed to

[GitHub] [spark] MaxGekk commented on pull request #33751: [SPARK-36522][SQL] Casting invalid string to year-month interval should return null

2021-08-16 Thread GitBox
MaxGekk commented on pull request #33751: URL: https://github.com/apache/spark/pull/33751#issuecomment-899614009 > Could you provide examples of features implemented as ANSI mode by default? @gengliangwang Look at any expression in `intervalExpressions.scala` that operates over ANSI

[GitHub] [spark] AmplabJenkins commented on pull request #33752: [SPARK-36401][PYTHON][WIP] Implement Series.cov

2021-08-16 Thread GitBox
AmplabJenkins commented on pull request #33752: URL: https://github.com/apache/spark/pull/33752#issuecomment-899613284 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] AmplabJenkins commented on pull request #33748: [SPARK-36516][SQL] Add File Metadata cache support for Orc

2021-08-16 Thread GitBox
AmplabJenkins commented on pull request #33748: URL: https://github.com/apache/spark/pull/33748#issuecomment-899613087 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142497/ -- This

[GitHub] [spark] dgd-contributor opened a new pull request #33752: [SPARK-36401][PYTHON][WIP] Implement Series.cov

2021-08-16 Thread GitBox
dgd-contributor opened a new pull request #33752: URL: https://github.com/apache/spark/pull/33752 ### What changes were proposed in this pull request? Implement Series.cov ### Why are the changes needed? That is supported in pandas. We should support that as

[GitHub] [spark] mridulm edited a comment on pull request #33617: [SPARK-35548][CORE][SHUFFLE] Handling new attempt has started error message in BlockPushErrorHandler in client

2021-08-16 Thread GitBox
mridulm edited a comment on pull request #33617: URL: https://github.com/apache/spark/pull/33617#issuecomment-899605631 +CC @gengliangwang potential last min PR for merge into 3.2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] mridulm commented on pull request #33617: [SPARK-35548][CORE][SHUFFLE] Handling new attempt has started error message in BlockPushErrorHandler in client

2021-08-16 Thread GitBox
mridulm commented on pull request #33617: URL: https://github.com/apache/spark/pull/33617#issuecomment-899605631 +CC @gengliangwang potential last min PR for 3.2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] cloud-fan commented on pull request #33709: [SPARK-36418][SQL] Use CAST in parsing of dates/timestamps with default pattern

2021-08-16 Thread GitBox
cloud-fan commented on pull request #33709: URL: https://github.com/apache/spark/pull/33709#issuecomment-899603714 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] cloud-fan closed pull request #33709: [SPARK-36418][SQL] Use CAST in parsing of dates/timestamps with default pattern

2021-08-16 Thread GitBox
cloud-fan closed pull request #33709: URL: https://github.com/apache/spark/pull/33709 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] mridulm commented on pull request #33615: [SPARK-36374][SHUFFLE][DOC] Push-based shuffle high level user documentation

2021-08-16 Thread GitBox
mridulm commented on pull request #33615: URL: https://github.com/apache/spark/pull/33615#issuecomment-899602206 Merged to master, branch-3.2 +CC @gengliangwang Thanks for working on this @venkata91 ! Thanks for the reviews @Ngone51 , @gengliangwang, @Victsm :-) -- This

[GitHub] [spark] SparkQA commented on pull request #33748: [SPARK-36516][SQL] Add File Metadata cache support for Orc

2021-08-16 Thread GitBox
SparkQA commented on pull request #33748: URL: https://github.com/apache/spark/pull/33748#issuecomment-899601040 **[Test build #142513 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142513/testReport)** for PR 33748 at commit

[GitHub] [spark] asfgit closed pull request #33615: [SPARK-36374][SHUFFLE][DOC] Push-based shuffle high level user documentation

2021-08-16 Thread GitBox
asfgit closed pull request #33615: URL: https://github.com/apache/spark/pull/33615 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33736: [SPARK-35991][SQL] Add PlanStability suite for TPCH

2021-08-16 Thread GitBox
AmplabJenkins removed a comment on pull request #33736: URL: https://github.com/apache/spark/pull/33736#issuecomment-899446931 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33748: [SPARK-36516][SQL] Add File Metadata cache support for Orc

2021-08-16 Thread GitBox
AmplabJenkins removed a comment on pull request #33748: URL: https://github.com/apache/spark/pull/33748#issuecomment-899597438 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142496/

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33751: [SPARK-36522][SQL] Casting invalid string to year-month interval should return null

2021-08-16 Thread GitBox
AmplabJenkins removed a comment on pull request #33751: URL: https://github.com/apache/spark/pull/33751#issuecomment-899573122 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan commented on pull request #33751: [SPARK-36522][SQL] Casting invalid string to year-month interval should return null

2021-08-16 Thread GitBox
cloud-fan commented on pull request #33751: URL: https://github.com/apache/spark/pull/33751#issuecomment-899597798 well, if there are a lot of interval operations that follow ANSI semantic now, then it's not realistic to change them all, and we should just polish the migration guide to

[GitHub] [spark] AmplabJenkins commented on pull request #33736: [SPARK-35991][SQL] Add PlanStability suite for TPCH

2021-08-16 Thread GitBox
AmplabJenkins commented on pull request #33736: URL: https://github.com/apache/spark/pull/33736#issuecomment-899597442 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142508/ -- This

[GitHub] [spark] AmplabJenkins commented on pull request #33751: [SPARK-36522][SQL] Casting invalid string to year-month interval should return null

2021-08-16 Thread GitBox
AmplabJenkins commented on pull request #33751: URL: https://github.com/apache/spark/pull/33751#issuecomment-899597440 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47012/ --

[GitHub] [spark] AmplabJenkins commented on pull request #33748: [SPARK-36516][SQL] Add File Metadata cache support for Orc

2021-08-16 Thread GitBox
AmplabJenkins commented on pull request #33748: URL: https://github.com/apache/spark/pull/33748#issuecomment-899597438 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142496/ -- This

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33615: [SPARK-36374][SHUFFLE][DOC] Push-based shuffle high level user documentation

2021-08-16 Thread GitBox
AmplabJenkins removed a comment on pull request #33615: URL: https://github.com/apache/spark/pull/33615#issuecomment-898705860 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142446/

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #33748: [SPARK-36516][SQL] Add File Metadata cache support for Orc

2021-08-16 Thread GitBox
dongjoon-hyun commented on a change in pull request #33748: URL: https://github.com/apache/spark/pull/33748#discussion_r689632152 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcPartitionReaderFactory.scala ## @@ -158,6 +169,10 @@

[GitHub] [spark] mridulm commented on pull request #33615: [SPARK-36374][SHUFFLE][DOC] Push-based shuffle high level user documentation

2021-08-16 Thread GitBox
mridulm commented on pull request #33615: URL: https://github.com/apache/spark/pull/33615#issuecomment-899595911 Given RC timelines, will it be possible to take a pass @Ngone51, @gengliangwang ? I want to merge only after you are fine with the latest version. Thanks. -- This is an

[GitHub] [spark] gengliangwang commented on pull request #33751: [SPARK-36522][SQL] Casting invalid string to year-month interval should return null

2021-08-16 Thread GitBox
gengliangwang commented on pull request #33751: URL: https://github.com/apache/spark/pull/33751#issuecomment-899595580 > As far as I know we implemented ANSI intervals in some kind of strict mode (ANSI mode) everywhere, and ignored the ANSI mode SQL config. @MaxGekk I was testing

[GitHub] [spark] LuciferYang commented on a change in pull request #33748: [SPARK-36516][SQL] Add File Metadata cache support for Orc

2021-08-16 Thread GitBox
LuciferYang commented on a change in pull request #33748: URL: https://github.com/apache/spark/pull/33748#discussion_r689630087 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcPartitionReaderFactory.scala ## @@ -158,6 +169,10 @@ case

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #33748: [SPARK-36516][SQL] Add File Metadata cache support for Orc

2021-08-16 Thread GitBox
dongjoon-hyun commented on a change in pull request #33748: URL: https://github.com/apache/spark/pull/33748#discussion_r689628677 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileMetaCacheManager.scala ## @@ -0,0 +1,87 @@ +/* + * Licensed

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #33748: [SPARK-36516][SQL] Add File Metadata cache support for Orc

2021-08-16 Thread GitBox
dongjoon-hyun commented on a change in pull request #33748: URL: https://github.com/apache/spark/pull/33748#discussion_r689628290 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileMetaCacheManager.scala ## @@ -0,0 +1,87 @@ +/* + * Licensed

[GitHub] [spark] SparkQA commented on pull request #33750: [SPARK-36521][SQL] Disallow comparison between Interval and String

2021-08-16 Thread GitBox
SparkQA commented on pull request #33750: URL: https://github.com/apache/spark/pull/33750#issuecomment-899592377 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47013/ -- This is an automated message from the Apache

[GitHub] [spark] SparkQA removed a comment on pull request #33748: [SPARK-36516][SQL] Add File Metadata cache support for Orc

2021-08-16 Thread GitBox
SparkQA removed a comment on pull request #33748: URL: https://github.com/apache/spark/pull/33748#issuecomment-899264134 **[Test build #142496 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142496/testReport)** for PR 33748 at commit

[GitHub] [spark] SparkQA removed a comment on pull request #33736: [SPARK-35991][SQL] Add PlanStability suite for TPCH

2021-08-16 Thread GitBox
SparkQA removed a comment on pull request #33736: URL: https://github.com/apache/spark/pull/33736#issuecomment-899403987 **[Test build #142508 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142508/testReport)** for PR 33736 at commit

[GitHub] [spark] SparkQA removed a comment on pull request #33751: [SPARK-36522][SQL] Casting invalid string to year-month interval should return null

2021-08-16 Thread GitBox
SparkQA removed a comment on pull request #33751: URL: https://github.com/apache/spark/pull/33751#issuecomment-899552755 **[Test build #142511 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142511/testReport)** for PR 33751 at commit

[GitHub] [spark] LuciferYang commented on a change in pull request #33748: [SPARK-36516][SQL] Add File Metadata cache support for Orc

2021-08-16 Thread GitBox
LuciferYang commented on a change in pull request #33748: URL: https://github.com/apache/spark/pull/33748#discussion_r689625999 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcPartitionReaderFactory.scala ## @@ -158,6 +169,10 @@ case

[GitHub] [spark] SparkQA commented on pull request #33748: [SPARK-36516][SQL] Add File Metadata cache support for Orc

2021-08-16 Thread GitBox
SparkQA commented on pull request #33748: URL: https://github.com/apache/spark/pull/33748#issuecomment-899584712 **[Test build #142496 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142496/testReport)** for PR 33748 at commit

[GitHub] [spark] SparkQA commented on pull request #33751: [SPARK-36522][SQL] Casting invalid string to year-month interval should return null

2021-08-16 Thread GitBox
SparkQA commented on pull request #33751: URL: https://github.com/apache/spark/pull/33751#issuecomment-899584176 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47012/ --

[GitHub] [spark] SparkQA commented on pull request #33736: [SPARK-35991][SQL] Add PlanStability suite for TPCH

2021-08-16 Thread GitBox
SparkQA commented on pull request #33736: URL: https://github.com/apache/spark/pull/33736#issuecomment-899584008 **[Test build #142508 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142508/testReport)** for PR 33736 at commit

[GitHub] [spark] srowen commented on pull request #33710: [SPARK-36481][ML] Expose LogisticRegression.setInitialModel, like KMeans et al do

2021-08-16 Thread GitBox
srowen commented on pull request #33710: URL: https://github.com/apache/spark/pull/33710#issuecomment-899582615 It seems OK to add as a further enhancement. Initial model might only make sense as vectors/matrices in certain cases, but it could be an alternative way to specify it, yes.

[GitHub] [spark] LuciferYang commented on a change in pull request #33748: [SPARK-36516][SQL] Add File Metadata cache support for Orc

2021-08-16 Thread GitBox
LuciferYang commented on a change in pull request #33748: URL: https://github.com/apache/spark/pull/33748#discussion_r689615811 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileMeta.scala ## @@ -0,0 +1,49 @@ +/* + * Licensed to the

[GitHub] [spark] LuciferYang commented on a change in pull request #33748: [SPARK-36516][SQL] Add File Metadata cache support for Orc

2021-08-16 Thread GitBox
LuciferYang commented on a change in pull request #33748: URL: https://github.com/apache/spark/pull/33748#discussion_r689614381 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileMeta.scala ## @@ -0,0 +1,49 @@ +/* + * Licensed to the

[GitHub] [spark] cloud-fan commented on pull request #33751: [SPARK-36522][SQL] Casting invalid string to year-month interval should return null

2021-08-16 Thread GitBox
cloud-fan commented on pull request #33751: URL: https://github.com/apache/spark/pull/33751#issuecomment-899580421 > Why are the changes needed? > Make it consistent with the other data type cast. I don't think this is the right reason. Since we turn on ansi interval types by

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #33748: [SPARK-36516][SQL] Add File Metadata cache support for Orc

2021-08-16 Thread GitBox
dongjoon-hyun commented on a change in pull request #33748: URL: https://github.com/apache/spark/pull/33748#discussion_r689611925 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcPartitionReaderFactory.scala ## @@ -158,6 +169,10 @@

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #33748: [SPARK-36516][SQL] Add File Metadata cache support for Orc

2021-08-16 Thread GitBox
dongjoon-hyun commented on a change in pull request #33748: URL: https://github.com/apache/spark/pull/33748#discussion_r689608748 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileMeta.scala ## @@ -0,0 +1,49 @@ +/* + * Licensed to the

[GitHub] [spark] SparkQA commented on pull request #33751: [SPARK-36522][SQL] Casting invalid string to year-month interval should return null

2021-08-16 Thread GitBox
SparkQA commented on pull request #33751: URL: https://github.com/apache/spark/pull/33751#issuecomment-899573025 **[Test build #142511 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142511/testReport)** for PR 33751 at commit

[GitHub] [spark] AmplabJenkins commented on pull request #33751: [SPARK-36522][SQL] Casting invalid string to year-month interval should return null

2021-08-16 Thread GitBox
AmplabJenkins commented on pull request #33751: URL: https://github.com/apache/spark/pull/33751#issuecomment-899573122 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142511/ -- This

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #33748: [SPARK-36516][SQL] Add File Metadata cache support for Orc

2021-08-16 Thread GitBox
dongjoon-hyun commented on a change in pull request #33748: URL: https://github.com/apache/spark/pull/33748#discussion_r689604157 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileMetaCacheManager.scala ## @@ -0,0 +1,87 @@ +/* + * Licensed

[GitHub] [spark] gengliangwang commented on pull request #33751: [SPARK-36522][SQL] Casting invalid string to year-month interval should return null

2021-08-16 Thread GitBox
gengliangwang commented on pull request #33751: URL: https://github.com/apache/spark/pull/33751#issuecomment-899566916 @cloud-fan I have created https://issues.apache.org/jira/browse/SPARK-36523 for the day-time interval. The method `castStringToYMInterval` is used in multiple places.

[GitHub] [spark] gengliangwang commented on a change in pull request #33751: [SPARK-36522][SQL] Casting invalid string to year-month interval should return null

2021-08-16 Thread GitBox
gengliangwang commented on a change in pull request #33751: URL: https://github.com/apache/spark/pull/33751#discussion_r689594580 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala ## @@ -204,7 +219,7 @@ object IntervalUtils {

[GitHub] [spark] cloud-fan commented on pull request #33751: [SPARK-36522][SQL] Casting invalid string to year-month interval should return null

2021-08-16 Thread GitBox
cloud-fan commented on pull request #33751: URL: https://github.com/apache/spark/pull/33751#issuecomment-899557480 Can we fix day-time interval in this PR as well? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] zhengruifeng edited a comment on pull request #33710: [SPARK-36481][ML] Expose LogisticRegression.setInitialModel, like KMeans et al do

2021-08-16 Thread GitBox
zhengruifeng edited a comment on pull request #33710: URL: https://github.com/apache/spark/pull/33710#issuecomment-899555429 I personally perfer to use `Param[Vector]/Param[Matrix]` instead, it will be convenient to add it in the python side, moreover, it is easy to import model trained

[GitHub] [spark] zhengruifeng commented on pull request #33710: [SPARK-36481][ML] Expose LogisticRegression.setInitialModel, like KMeans et al do

2021-08-16 Thread GitBox
zhengruifeng commented on pull request #33710: URL: https://github.com/apache/spark/pull/33710#issuecomment-899555429 I personal perfer to use `Param[Vector]/Param[Matrix]` instead, it will be convenient to add it in the python side, moreover, it is easy to import model trained in other

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33751: [SPARK-36522][SQL] Casting invalid string to year-month interval should return null

2021-08-16 Thread GitBox
AmplabJenkins removed a comment on pull request #33751: URL: https://github.com/apache/spark/pull/33751#issuecomment-899553512 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47011/

[GitHub] [spark] AmplabJenkins commented on pull request #33751: [SPARK-36522][SQL] Casting invalid string to year-month interval should return null

2021-08-16 Thread GitBox
AmplabJenkins commented on pull request #33751: URL: https://github.com/apache/spark/pull/33751#issuecomment-899553512 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47011/ --

[GitHub] [spark] SparkQA commented on pull request #33751: [SPARK-36522][SQL] Casting invalid string to year-month interval should return null

2021-08-16 Thread GitBox
SparkQA commented on pull request #33751: URL: https://github.com/apache/spark/pull/33751#issuecomment-899553468 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47011/ --

[GitHub] [spark] SparkQA commented on pull request #33750: [SPARK-36521][SQL] Disallow comparison between Interval and String

2021-08-16 Thread GitBox
SparkQA commented on pull request #33750: URL: https://github.com/apache/spark/pull/33750#issuecomment-899552909 **[Test build #142512 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142512/testReport)** for PR 33750 at commit

[GitHub] [spark] SparkQA commented on pull request #33751: [SPARK-36522][SQL] Casting invalid string to year-month interval should return null

2021-08-16 Thread GitBox
SparkQA commented on pull request #33751: URL: https://github.com/apache/spark/pull/33751#issuecomment-899552755 **[Test build #142511 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142511/testReport)** for PR 33751 at commit

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30483: [SPARK-33449][SQL] Add File Metadata cache support for Parquet

2021-08-16 Thread GitBox
AmplabJenkins removed a comment on pull request #30483: URL: https://github.com/apache/spark/pull/30483#issuecomment-899551975 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142505/

[GitHub] [spark] AmplabJenkins commented on pull request #30483: [SPARK-33449][SQL] Add File Metadata cache support for Parquet

2021-08-16 Thread GitBox
AmplabJenkins commented on pull request #30483: URL: https://github.com/apache/spark/pull/30483#issuecomment-899551975 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142505/ -- This

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33751: [SPARK-36522][SQL] Casting invalid string to year-month interval should return null

2021-08-16 Thread GitBox
AmplabJenkins removed a comment on pull request #33751: URL: https://github.com/apache/spark/pull/33751#issuecomment-899551477 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142510/

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30483: [SPARK-33449][SQL] Add File Metadata cache support for Parquet

2021-08-16 Thread GitBox
AmplabJenkins removed a comment on pull request #30483: URL: https://github.com/apache/spark/pull/30483#issuecomment-899478790 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142500/

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32816: [SPARK-33832][SQL] Support optimize skewed join even if introduce extra shuffle

2021-08-16 Thread GitBox
AmplabJenkins removed a comment on pull request #32816: URL: https://github.com/apache/spark/pull/32816#issuecomment-899551474 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142493/

[GitHub] [spark] AmplabJenkins commented on pull request #33751: [SPARK-36522][SQL] Casting invalid string to year-month interval should return null

2021-08-16 Thread GitBox
AmplabJenkins commented on pull request #33751: URL: https://github.com/apache/spark/pull/33751#issuecomment-899551477 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142510/ -- This

[GitHub] [spark] AmplabJenkins commented on pull request #32816: [SPARK-33832][SQL] Support optimize skewed join even if introduce extra shuffle

2021-08-16 Thread GitBox
AmplabJenkins commented on pull request #32816: URL: https://github.com/apache/spark/pull/32816#issuecomment-899551474 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142493/ -- This

[GitHub] [spark] SparkQA removed a comment on pull request #30483: [SPARK-33449][SQL] Add File Metadata cache support for Parquet

2021-08-16 Thread GitBox
SparkQA removed a comment on pull request #30483: URL: https://github.com/apache/spark/pull/30483#issuecomment-899364996 **[Test build #142505 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142505/testReport)** for PR 30483 at commit

[GitHub] [spark] SparkQA commented on pull request #30483: [SPARK-33449][SQL] Add File Metadata cache support for Parquet

2021-08-16 Thread GitBox
SparkQA commented on pull request #30483: URL: https://github.com/apache/spark/pull/30483#issuecomment-899550395 **[Test build #142505 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142505/testReport)** for PR 30483 at commit

[GitHub] [spark] SparkQA removed a comment on pull request #33751: [SPARK-36522][SQL] Casting invalid string to year-month interval should return null

2021-08-16 Thread GitBox
SparkQA removed a comment on pull request #33751: URL: https://github.com/apache/spark/pull/33751#issuecomment-899524926 **[Test build #142510 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142510/testReport)** for PR 33751 at commit

[GitHub] [spark] SparkQA removed a comment on pull request #32816: [SPARK-33832][SQL] Support optimize skewed join even if introduce extra shuffle

2021-08-16 Thread GitBox
SparkQA removed a comment on pull request #32816: URL: https://github.com/apache/spark/pull/32816#issuecomment-899239127 **[Test build #142493 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142493/testReport)** for PR 32816 at commit

[GitHub] [spark] gengliangwang commented on pull request #33751: [SPARK-36522][SQL] Casting invalid string to year-month interval should return null

2021-08-16 Thread GitBox
gengliangwang commented on pull request #33751: URL: https://github.com/apache/spark/pull/33751#issuecomment-899545402 jenkins, retest this please. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] zhengruifeng commented on pull request #33710: [SPARK-36481][ML] Expose LogisticRegression.setInitialModel, like KMeans et al do

2021-08-16 Thread GitBox
zhengruifeng commented on pull request #33710: URL: https://github.com/apache/spark/pull/33710#issuecomment-899543763 @srowen Sorry for coming late. Exposing the initial model is super useful. Maybe we should unify the API, since some `Param[Vector]/Param[Matrix]` is used in mllib, like

[GitHub] [spark] brandondahler commented on a change in pull request #33323: [SPARK-35739][SQL] Add Java-compatible Dataset.join overloads

2021-08-16 Thread GitBox
brandondahler commented on a change in pull request #33323: URL: https://github.com/apache/spark/pull/33323#discussion_r689567804 ## File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ## @@ -970,7 +995,59 @@ class Dataset[T] private[sql]( } /** - *

[GitHub] [spark] SparkQA commented on pull request #32816: [SPARK-33832][SQL] Support optimize skewed join even if introduce extra shuffle

2021-08-16 Thread GitBox
SparkQA commented on pull request #32816: URL: https://github.com/apache/spark/pull/32816#issuecomment-899537814 **[Test build #142493 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142493/testReport)** for PR 32816 at commit

[GitHub] [spark] gengliangwang commented on a change in pull request #33750: [SPARK-36521][SQL] Disallow comparison between Interval and String

2021-08-16 Thread GitBox
gengliangwang commented on a change in pull request #33750: URL: https://github.com/apache/spark/pull/33750#discussion_r689565406 ## File path: sql/core/src/test/resources/sql-tests/results/interval.sql.out ## @@ -2292,49 +2292,55 @@ cannot resolve '(INTERVAL '1' MONTH >

[GitHub] [spark] SparkQA commented on pull request #33751: [SPARK-36522][SQL] Casting invalid string to year-month interval should return null

2021-08-16 Thread GitBox
SparkQA commented on pull request #33751: URL: https://github.com/apache/spark/pull/33751#issuecomment-899529098 **[Test build #142510 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142510/testReport)** for PR 33751 at commit

[GitHub] [spark] SparkQA commented on pull request #33751: [SPARK-36522][SQL] Casting invalid string to year-month interval should return null

2021-08-16 Thread GitBox
SparkQA commented on pull request #33751: URL: https://github.com/apache/spark/pull/33751#issuecomment-899524926 **[Test build #142510 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142510/testReport)** for PR 33751 at commit

[GitHub] [spark] gengliangwang commented on a change in pull request #33751: [SPARK-36522][SQL] Casting invalid string to year-month interval should return null

2021-08-16 Thread GitBox
gengliangwang commented on a change in pull request #33751: URL: https://github.com/apache/spark/pull/33751#discussion_r689550016 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/BaseScriptTransformationExec.scala ## @@ -224,9 +224,8 @@ trait

[GitHub] [spark] gengliangwang opened a new pull request #33751: [SPARK-36522][SQL] Casting invalid string to year-month interval should return null

2021-08-16 Thread GitBox
gengliangwang opened a new pull request #33751: URL: https://github.com/apache/spark/pull/33751 ### What changes were proposed in this pull request? Currently when casting invalid string to year-month interval, Spark always throws an exception, with/without ANSI mode

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33750: [SPARK-36521][SQL] Disallow comparison between Interval and String

2021-08-16 Thread GitBox
AmplabJenkins removed a comment on pull request #33750: URL: https://github.com/apache/spark/pull/33750#issuecomment-899505355 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47010/

[GitHub] [spark] AmplabJenkins commented on pull request #33750: [SPARK-36521][SQL] Disallow comparison between Interval and String

2021-08-16 Thread GitBox
AmplabJenkins commented on pull request #33750: URL: https://github.com/apache/spark/pull/33750#issuecomment-899505355 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47010/ --

<    1   2   3   4   5   6   7   >