[GitHub] [spark] ulysses-you commented on pull request #35729: [SPARK-38410][SQL] Support specify initial partition number for rebalance

2022-03-03 Thread GitBox
ulysses-you commented on pull request #35729: URL: https://github.com/apache/spark/pull/35729#issuecomment-1058909560 cc @HyukjinKwon @cloud-fan @yaooqinn -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] LuciferYang commented on pull request #35580: [WIP][SPARK-38257][BUILD] Upgrade `rockdbjni` to 6.29.3

2022-03-03 Thread GitBox
LuciferYang commented on pull request #35580: URL: https://github.com/apache/spark/pull/35580#issuecomment-1058909472 @dongjoon-hyun Thank you for syncing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] ulysses-you commented on a change in pull request #35719: [SPARK-38401][SQL][CORE] Unify get preferred locations for shuffle in AQE

2022-03-03 Thread GitBox
ulysses-you commented on a change in pull request #35719: URL: https://github.com/apache/spark/pull/35719#discussion_r819326181 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ShuffledRowRDD.scala ## @@ -177,19 +177,36 @@ class ShuffledRowRDD( val tra

[GitHub] [spark] martin-g commented on pull request #35560: [SPARK-38351][TESTS] Don't use deprecate symbol API in test classes

2022-03-03 Thread GitBox
martin-g commented on pull request #35560: URL: https://github.com/apache/spark/pull/35560#issuecomment-1058899649 Agreed! I will work on this soon! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] martin-g commented on a change in pull request #34622: [SPARK-37340][UI] Display StageIds in Operators for SQL UI

2022-03-03 Thread GitBox
martin-g commented on a change in pull request #34622: URL: https://github.com/apache/spark/pull/34622#discussion_r819314166 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala ## @@ -138,6 +139,30 @@ class SQLAppStatusListener(

[GitHub] [spark] martin-g commented on a change in pull request #35728: [SPARK-38189][K8S][DOC] Add `Priority scheduling` doc for Spark on K8S

2022-03-03 Thread GitBox
martin-g commented on a change in pull request #35728: URL: https://github.com/apache/spark/pull/35728#discussion_r819304189 ## File path: docs/running-on-kubernetes.md ## @@ -1707,6 +1707,30 @@ Spark automatically handles translating the Spark configs spark.{driver/ex Kube

[GitHub] [spark] chenjunbiao001 commented on pull request #35721: Supports customization of the entire optimizer and planner

2022-03-03 Thread GitBox
chenjunbiao001 commented on pull request #35721: URL: https://github.com/apache/spark/pull/35721#issuecomment-1058871514 这是来自QQ邮箱的假期自动回复邮件。   您好,邮件已收到,感谢您的来信! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] AmplabJenkins commented on pull request #35721: Supports customization of the entire optimizer and planner

2022-03-03 Thread GitBox
AmplabJenkins commented on pull request #35721: URL: https://github.com/apache/spark/pull/35721#issuecomment-1058871372 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HeartSaVioR commented on a change in pull request #35673: [SPARK-38204][SS] Use StatefulOpClusteredDistribution for stateful operators with respecting backward compatibility

2022-03-03 Thread GitBox
HeartSaVioR commented on a change in pull request #35673: URL: https://github.com/apache/spark/pull/35673#discussion_r819293042 ## File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingSessionWindowDistributionSuite.scala ## @@ -0,0 +1,232 @@ +/* + * Licen

[GitHub] [spark] HeartSaVioR opened a new pull request #35731: [SPARK-38412][SS] Fix the swapped sequence of from and to in StateSchemaCompatibilityChecker

2022-03-03 Thread GitBox
HeartSaVioR opened a new pull request #35731: URL: https://github.com/apache/spark/pull/35731 ### What changes were proposed in this pull request? This PR fixes the StateSchemaCompatibilityChecker which mistakenly swapped `from` (should be provided schema) and `to` (should be existin

[GitHub] [spark] GabeChurch removed a comment on pull request #32518: [SPARK-35383][CORE] Improve s3a magic committer support by inferring missing configs

2022-03-03 Thread GitBox
GabeChurch removed a comment on pull request #32518: URL: https://github.com/apache/spark/pull/32518#issuecomment-1058840240 @dongjoon-hyun thank you! This is golden. I know you are active on apache orc project so if you have any additional wisdom to share please do. I've be

[GitHub] [spark] GabeChurch edited a comment on pull request #32518: [SPARK-35383][CORE] Improve s3a magic committer support by inferring missing configs

2022-03-03 Thread GitBox
GabeChurch edited a comment on pull request #32518: URL: https://github.com/apache/spark/pull/32518#issuecomment-1058840240 @dongjoon-hyun thank you! This is golden. I know you are active on apache orc project so if you have any additional wisdom to share please do. I've bee

[GitHub] [spark] pralabhkumar commented on pull request #35191: [SPARK-37491][PYTHON]Fix Series.asof for unsorted values

2022-03-03 Thread GitBox
pralabhkumar commented on pull request #35191: URL: https://github.com/apache/spark/pull/35191#issuecomment-1058865169 done the changes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[GitHub] [spark] pralabhkumar commented on a change in pull request #35191: [SPARK-37491][PYTHON]Fix Series.asof for unsorted values

2022-03-03 Thread GitBox
pralabhkumar commented on a change in pull request #35191: URL: https://github.com/apache/spark/pull/35191#discussion_r819290416 ## File path: python/pyspark/pandas/series.py ## @@ -5228,10 +5229,22 @@ def asof(self, where: Union[Any, List]) -> Union[Scalar, "Series"]:

[GitHub] [spark] GabeChurch edited a comment on pull request #32518: [SPARK-35383][CORE] Improve s3a magic committer support by inferring missing configs

2022-03-03 Thread GitBox
GabeChurch edited a comment on pull request #32518: URL: https://github.com/apache/spark/pull/32518#issuecomment-1058840240 @dongjoon-hyun thank you! This is golden. I know you are active on apache orc project so if you have any additional wisdom to share please do. I've bee

[GitHub] [spark] GabeChurch edited a comment on pull request #32518: [SPARK-35383][CORE] Improve s3a magic committer support by inferring missing configs

2022-03-03 Thread GitBox
GabeChurch edited a comment on pull request #32518: URL: https://github.com/apache/spark/pull/32518#issuecomment-1058840240 @dongjoon-hyun thank you! This is golden. I've been testing with Spark 3.2 major and 3.3 fork on Kubernetes (couple TB writes) for awhile now and seeing signific

[GitHub] [spark] GabeChurch edited a comment on pull request #32518: [SPARK-35383][CORE] Improve s3a magic committer support by inferring missing configs

2022-03-03 Thread GitBox
GabeChurch edited a comment on pull request #32518: URL: https://github.com/apache/spark/pull/32518#issuecomment-1058840240 @dongjoon-hyun thank you! This is golden. I've been testing with Spark 3.2 major and 3.3 fork on Kubernetes (couple TB writes) for awhile now and seeing signific

[GitHub] [spark] GabeChurch edited a comment on pull request #32518: [SPARK-35383][CORE] Improve s3a magic committer support by inferring missing configs

2022-03-03 Thread GitBox
GabeChurch edited a comment on pull request #32518: URL: https://github.com/apache/spark/pull/32518#issuecomment-1058840240 @dongjoon-hyun I'm curious, have you done any benchmarks for the magic s3 committer with ORC? I know you are pretty active on apache ORC project as well. I've be

[GitHub] [spark] GabeChurch edited a comment on pull request #32518: [SPARK-35383][CORE] Improve s3a magic committer support by inferring missing configs

2022-03-03 Thread GitBox
GabeChurch edited a comment on pull request #32518: URL: https://github.com/apache/spark/pull/32518#issuecomment-1058840240 @dongjoon-hyun I'm curious, have you done any benchmarks for the magic s3 committer with ORC? I know you are pretty active on apache ORC project as well. I've be

[GitHub] [spark] GabeChurch edited a comment on pull request #32518: [SPARK-35383][CORE] Improve s3a magic committer support by inferring missing configs

2022-03-03 Thread GitBox
GabeChurch edited a comment on pull request #32518: URL: https://github.com/apache/spark/pull/32518#issuecomment-1058840240 @dongjoon-hyun I'm curious, have you done any benchmarks for the magic s3 committer with ORC? I know you are pretty active on apache ORC project as well. I've be

[GitHub] [spark] GabeChurch edited a comment on pull request #32518: [SPARK-35383][CORE] Improve s3a magic committer support by inferring missing configs

2022-03-03 Thread GitBox
GabeChurch edited a comment on pull request #32518: URL: https://github.com/apache/spark/pull/32518#issuecomment-1058840240 @dongjoon-hyun I'm curious, have you done any benchmarks for the magic s3 committer with ORC? I know you are pretty active on apache ORC project as well. I've be

[GitHub] [spark] GabeChurch edited a comment on pull request #32518: [SPARK-35383][CORE] Improve s3a magic committer support by inferring missing configs

2022-03-03 Thread GitBox
GabeChurch edited a comment on pull request #32518: URL: https://github.com/apache/spark/pull/32518#issuecomment-1058840240 @dongjoon-hyun I'm curious, have you done any benchmarks for the magic s3 committer with ORC? I know you are pretty active on apache ORC project. I've been testin

[GitHub] [spark] huaxingao commented on a change in pull request #35691: [SPARK-38357][SQL][3.2] Fix StackOverflowError with OR(data filter, partition filter)

2022-03-03 Thread GitBox
huaxingao commented on a change in pull request #35691: URL: https://github.com/apache/spark/pull/35691#discussion_r819281553 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PruneFileSourcePartitions.scala ## @@ -83,7 +89,7 @@ private[sql] obje

[GitHub] [spark] mridulm commented on pull request #35637: [SPARK-38309][Core] Fix incorrect SHS stage percentile metrics for shuffle read bytes and shuffle total blocks

2022-03-03 Thread GitBox
mridulm commented on pull request #35637: URL: https://github.com/apache/spark/pull/35637#issuecomment-1058847727 Since the existing test was not catching this issue, I want to make sure that we are testing for this behavior. Given that we have exhaustively tested the current metrics - i

[GitHub] [spark] Ngone51 commented on a change in pull request #35719: [SPARK-38401][SQL][CORE] Unify get preferred locations for shuffle in AQE

2022-03-03 Thread GitBox
Ngone51 commented on a change in pull request #35719: URL: https://github.com/apache/spark/pull/35719#discussion_r819279830 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ShuffledRowRDD.scala ## @@ -177,19 +177,36 @@ class ShuffledRowRDD( val tracker

[GitHub] [spark] beliefer commented on pull request #35727: [SPARK-38361][SQL] Add factory method `getConnection` into `JDBCDialect`.

2022-03-03 Thread GitBox
beliefer commented on pull request #35727: URL: https://github.com/apache/spark/pull/35727#issuecomment-1058846261 ping @huaxingao cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] mridulm commented on a change in pull request #35637: [SPARK-38309][Core] Fix incorrect SHS stage percentile metrics for shuffle read bytes and shuffle total blocks

2022-03-03 Thread GitBox
mridulm commented on a change in pull request #35637: URL: https://github.com/apache/spark/pull/35637#discussion_r819277689 ## File path: core/src/test/scala/org/apache/spark/status/AppStatusStoreSuite.scala ## @@ -227,32 +268,41 @@ class AppStatusStoreSuite extends SparkFunSui

[GitHub] [spark] stczwd edited a comment on pull request #35669: [SPARK-38041][SQL] DataFilter pushed down with PartitionFilter

2022-03-03 Thread GitBox
stczwd edited a comment on pull request #35669: URL: https://github.com/apache/spark/pull/35669#issuecomment-1058842991 > Seems to me that data filters and partition filters are separated differently in V1 and V2 file sources and your optimization only work for V2 file source? Yes,

[GitHub] [spark] stczwd commented on pull request #35669: [SPARK-38041][SQL] DataFilter pushed down with PartitionFilter

2022-03-03 Thread GitBox
stczwd commented on pull request #35669: URL: https://github.com/apache/spark/pull/35669#issuecomment-1058842991 > Seems to me that data filters and partition filters are separated differently in V1 and V2 file sources and your optimization only work for V2 file source? Yes, this i

[GitHub] [spark] GabeChurch edited a comment on pull request #32518: [SPARK-35383][CORE] Improve s3a magic committer support by inferring missing configs

2022-03-03 Thread GitBox
GabeChurch edited a comment on pull request #32518: URL: https://github.com/apache/spark/pull/32518#issuecomment-1058840240 @dongjoon-hyun I'm curious, have you done any benchmarks for the magic s3 committer with ORC? I've been testing with Spark 3.2 major and 3.3 fork on Kubernetes (

[GitHub] [spark] GabeChurch commented on pull request #32518: [SPARK-35383][CORE] Improve s3a magic committer support by inferring missing configs

2022-03-03 Thread GitBox
GabeChurch commented on pull request #32518: URL: https://github.com/apache/spark/pull/32518#issuecomment-1058840240 @dongjoon-hyun I'm curious, have you done any benchmarks for the magic s3 committer with ORC? I've been testing with Spark 3.2 major and 3.3 fork on Kubernetes (couple

[GitHub] [spark] xinrong-databricks commented on a change in pull request #35671: [SPARK-38345][SQL] Introduce SQL function ARRAY_SIZE

2022-03-03 Thread GitBox
xinrong-databricks commented on a change in pull request #35671: URL: https://github.com/apache/spark/pull/35671#discussion_r819266462 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ## @@ -133,6 +133,33 @@ object

[GitHub] [spark] huaxingao commented on pull request #35669: [SPARK-38041][SQL] DataFilter pushed down with PartitionFilter

2022-03-03 Thread GitBox
huaxingao commented on pull request #35669: URL: https://github.com/apache/spark/pull/35669#issuecomment-1058827776 @stczwd The PR makes sense to me. I have one question: does your optimization work for V1 file source? Seems to me that data filters and partition filters are separated diffe

[GitHub] [spark] pan3793 commented on pull request #35730: [SPARK-38411] Use UTF-8 to read event log

2022-03-03 Thread GitBox
pan3793 commented on pull request #35730: URL: https://github.com/apache/spark/pull/35730#issuecomment-1058822211 cc @HeartSaVioR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

[GitHub] [spark] pan3793 opened a new pull request #35730: [SPARK-38411] Use UTF-8 to read event log

2022-03-03 Thread GitBox
pan3793 opened a new pull request #35730: URL: https://github.com/apache/spark/pull/35730 ### What changes were proposed in this pull request? Use UTF-8 instead of system default encoding to read event log ### Why are the changes needed? After SPARK-29160, we should alwa

[GitHub] [spark] ulysses-you opened a new pull request #35729: [SPARK-38410][SQL] Support specify initial partition number for rebalance

2022-03-03 Thread GitBox
ulysses-you opened a new pull request #35729: URL: https://github.com/apache/spark/pull/35729 ### What changes were proposed in this pull request? Pass `initialNumPartitions` into `RebalancePartitions`. ### Why are the changes needed? Rebalance partitions resolve

[GitHub] [spark] Yikun closed pull request #35639: [WIP][SPARK-38189][K8S] Support priority scheduling with Volcano implementations

2022-03-03 Thread GitBox
Yikun closed pull request #35639: URL: https://github.com/apache/spark/pull/35639 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr

[GitHub] [spark] Yikun commented on pull request #35639: [WIP][SPARK-38189][K8S] Support priority scheduling with Volcano implementations

2022-03-03 Thread GitBox
Yikun commented on pull request #35639: URL: https://github.com/apache/spark/pull/35639#issuecomment-1058801691 Replace by https://github.com/apache/spark/pull/35728 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] Yikun opened a new pull request #35728: [SPARK-38189][K8S][DOC] Add `priority scheduling` for Spark on K8S

2022-03-03 Thread GitBox
Yikun opened a new pull request #35728: URL: https://github.com/apache/spark/pull/35728 ### What changes were proposed in this pull request? Document how to set the priority class with the pod template. ### Why are the changes needed? Currently, we didn't have a certain doc to h

[GitHub] [spark] sleep1661 commented on pull request #32923: [SPARK-35783][SQL] Set the list of read columns in the task configuration to reduce reading of ORC data.

2022-03-03 Thread GitBox
sleep1661 commented on pull request #32923: URL: https://github.com/apache/spark/pull/32923#issuecomment-1058800304 > @cloud-fan We are migrating from 2.4.7 to 3.0.2, and observed a significant regression in some cases due to this issue. > > ![image](https://user-images.githubuserco

[GitHub] [spark] LuciferYang commented on pull request #35713: [SPARK-38393][SQL] Clean up deprecated usage of `GenSeq/GenMap`

2022-03-03 Thread GitBox
LuciferYang commented on pull request #35713: URL: https://github.com/apache/spark/pull/35713#issuecomment-1058793733 cc @srowen -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

[GitHub] [spark] wangshengjie123 commented on pull request #21893: [SPARK-24965][SQL] Support selecting from partitioned tabels with partitions having different data formats

2022-03-03 Thread GitBox
wangshengjie123 commented on pull request #21893: URL: https://github.com/apache/spark/pull/21893#issuecomment-1058793345 > We have some use cases for it and internally we have the fix with a smaller scope. @shangxinli, sorry to bother, would you submit a PR to solve this problem?

[GitHub] [spark] wangshengjie123 removed a comment on pull request #21893: [SPARK-24965][SQL] Support selecting from partitioned tabels with partitions having different data formats

2022-03-03 Thread GitBox
wangshengjie123 removed a comment on pull request #21893: URL: https://github.com/apache/spark/pull/21893#issuecomment-1058790556 > We have some use cases for it and internally we have the fix with a smaller scope. @shangxinli -- This is an automated message from the Apache Git Servi

[GitHub] [spark] wangshengjie123 commented on pull request #21893: [SPARK-24965][SQL] Support selecting from partitioned tabels with partitions having different data formats

2022-03-03 Thread GitBox
wangshengjie123 commented on pull request #21893: URL: https://github.com/apache/spark/pull/21893#issuecomment-1058790556 > We have some use cases for it and internally we have the fix with a smaller scope. @shangxinli -- This is an automated message from the Apache Git Service. To r

[GitHub] [spark] beliefer commented on pull request #35696: [SPARK-38361][SQL] Factory method getConnection should take Partition as optional parameter.

2022-03-03 Thread GitBox
beliefer commented on pull request #35696: URL: https://github.com/apache/spark/pull/35696#issuecomment-1058773907 https://github.com/apache/spark/pull/35727 used to replace this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

[GitHub] [spark] beliefer closed pull request #35696: [SPARK-38361][SQL] Factory method getConnection should take Partition as optional parameter.

2022-03-03 Thread GitBox
beliefer closed pull request #35696: URL: https://github.com/apache/spark/pull/35696 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsub

[GitHub] [spark] beliefer opened a new pull request #35727: [SPARK-38361][SQL] Add factory method getConnection into JDBCDialect.

2022-03-03 Thread GitBox
beliefer opened a new pull request #35727: URL: https://github.com/apache/spark/pull/35727 ### What changes were proposed in this pull request? At present, the parameter of the factory method for obtaining JDBC connection is empty because the JDBC URL of some databases is fixed and uniqu

[GitHub] [spark] ulysses-you commented on pull request #35719: [SPARK-38401][SQL][CORE] Unify get preferred locations for shuffle in AQE

2022-03-03 Thread GitBox
ulysses-you commented on pull request #35719: URL: https://github.com/apache/spark/pull/35719#issuecomment-1058755893 thank you @cloud-fan @mridulm for review, addressed comments. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[GitHub] [spark] xinrong-databricks commented on a change in pull request #35706: [SPARK-38387][PYTHON] Support `na_action` and Series input correspondence in `Series.map`

2022-03-03 Thread GitBox
xinrong-databricks commented on a change in pull request #35706: URL: https://github.com/apache/spark/pull/35706#discussion_r819205905 ## File path: python/pyspark/pandas/series.py ## @@ -992,8 +993,10 @@ def map(self, arg: Union[Dict, Callable]) -> "Series": Paramet

[GitHub] [spark] xinrong-databricks commented on a change in pull request #35706: [SPARK-38387][PYTHON] Support `na_action` and Series input correspondence in `Series.map`

2022-03-03 Thread GitBox
xinrong-databricks commented on a change in pull request #35706: URL: https://github.com/apache/spark/pull/35706#discussion_r819205751 ## File path: python/pyspark/pandas/series.py ## @@ -1045,8 +1048,18 @@ def map(self, arg: Union[Dict, Callable]) -> "Series": 2

[GitHub] [spark] xinrong-databricks commented on a change in pull request #35706: [SPARK-38387][PYTHON] Support `na_action` and Series input correspondence in `Series.map`

2022-03-03 Thread GitBox
xinrong-databricks commented on a change in pull request #35706: URL: https://github.com/apache/spark/pull/35706#discussion_r819201744 ## File path: python/pyspark/pandas/tests/test_series.py ## @@ -1161,13 +1161,29 @@ def test_append(self): def test_map(self): ps

[GitHub] [spark] sunchao commented on pull request #34855: [WIP][SPARK-37600][BUILD] Upgrade to Hadoop 3.3.2

2022-03-03 Thread GitBox
sunchao commented on pull request #34855: URL: https://github.com/apache/spark/pull/34855#issuecomment-1058739724 hmm somehow `YarnClustereSuite` started failing after 3.3.2. I'll need to check what caused the issue. -- This is an automated message from the Apache Git Service. To respond

[GitHub] [spark] AmplabJenkins commented on pull request #35726: [SPARK-37895][SQL] Filter push down column with quoted columns

2022-03-03 Thread GitBox
AmplabJenkins commented on pull request #35726: URL: https://github.com/apache/spark/pull/35726#issuecomment-1058726624 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] huaxingao commented on a change in pull request #35726: [SPARK-37895][SQL] Filter push down column with quoted columns

2022-03-03 Thread GitBox
huaxingao commented on a change in pull request #35726: URL: https://github.com/apache/spark/pull/35726#discussion_r819190168 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalogSuite.scala ## @@ -425,4 +425,20 @@ class JDBC

[GitHub] [spark] dtenedor commented on pull request #35690: [SPARK-38335][SQL] Implement parser support for DEFAULT column values

2022-03-03 Thread GitBox
dtenedor commented on pull request #35690: URL: https://github.com/apache/spark/pull/35690#issuecomment-1058723452 This is ready for another review round @gengliangwang @viirya @wangyum @HyukjinKwon @dongjoon-hyun :) -- This is an automated message from the Apache Git Service. To respond

[GitHub] [spark] github-actions[bot] commented on pull request #34396: [SPARK-37124][SQL] Support RowToColumnarExec with Arrow format

2022-03-03 Thread GitBox
github-actions[bot] commented on pull request #34396: URL: https://github.com/apache/spark/pull/34396#issuecomment-1058691919 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue m

[GitHub] [spark] stczwd commented on pull request #35669: [SPARK-38041][SQL] DataFilter pushed down with PartitionFilter

2022-03-03 Thread GitBox
stczwd commented on pull request #35669: URL: https://github.com/apache/spark/pull/35669#issuecomment-1058666337 > I think the optimization should be able to add for ORC as well. Great. I will create a new JIRA for ORC optimization, this PR will only support parquet optimization. -

[GitHub] [spark] c21 commented on pull request #35669: [SPARK-38041][SQL] DataFilter pushed down with PartitionFilter

2022-03-03 Thread GitBox
c21 commented on pull request #35669: URL: https://github.com/apache/spark/pull/35669#issuecomment-1058577352 I think the optimization should be able to add for ORC as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] yliou commented on pull request #34622: [SPARK-37340][UI] Display StageIds in Operators for SQL UI

2022-03-03 Thread GitBox
yliou commented on pull request #34622: URL: https://github.com/apache/spark/pull/34622#issuecomment-1058561177 @tgravescs do you have time to take a quick look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] anchovYu commented on pull request #35707: [SPARK-38385][SQL] Improve error messages of 'mismatched input' cases from ANTLR

2022-03-03 Thread GitBox
anchovYu commented on pull request #35707: URL: https://github.com/apache/spark/pull/35707#issuecomment-1058558335 Thanks @cloud-fan for the review. I feel sorry for all the linting problem.. Before I commit, I ran the dev/scalastyle and it shows all style has passed, then I thought it is

[GitHub] [spark] dongjoon-hyun commented on pull request #35580: [WIP][SPARK-38257][BUILD] Upgrade `rockdbjni` to 6.29.3

2022-03-03 Thread GitBox
dongjoon-hyun commented on pull request #35580: URL: https://github.com/apache/spark/pull/35580#issuecomment-1058540179 We got the official confirmation like the following. `6.29.3` fails. We need a new version of rocksDB and rocksDB JNI will arrive after that one or two months. - https

[GitHub] [spark] anchovYu commented on a change in pull request #35707: [SPARK-38385][SQL] Improve error messages of 'mismatched input' cases from ANTLR

2022-03-03 Thread GitBox
anchovYu commented on a change in pull request #35707: URL: https://github.com/apache/spark/pull/35707#discussion_r819095750 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala ## @@ -1774,7 +1777,7 @@ class DDLParserSuite extend

[GitHub] [spark] HeartSaVioR commented on pull request #35560: [SPARK-38351][TESTS] Don't use deprecate symbol API in test classes

2022-03-03 Thread GitBox
HeartSaVioR commented on pull request #35560: URL: https://github.com/apache/spark/pull/35560#issuecomment-1058523917 Follow-up fix sounds good to me. Easier to work based on this since we now just need to try finding `Symbol(` and replace there. Finding shorthand of Symbol `'` would be mu

[GitHub] [spark] srowen commented on pull request #35560: [SPARK-38351][TESTS] Don't use deprecate symbol API in test classes

2022-03-03 Thread GitBox
srowen commented on pull request #35560: URL: https://github.com/apache/spark/pull/35560#issuecomment-1058516997 Hm, I think you have a point. We definitely need Symbol in some parts of the Scala code, but I think they're rare and limited to parts that reason about closures, etc. For colum

[GitHub] [spark] HeartSaVioR commented on pull request #35560: [SPARK-38351][TESTS] Don't use deprecate symbol API in test classes

2022-03-03 Thread GitBox
HeartSaVioR commented on pull request #35560: URL: https://github.com/apache/spark/pull/35560#issuecomment-1058514200 Right, but what else we use Symbol except representing Column? I might be missing something, but from what I've seen from test codes, majority of usage has been the same fo

[GitHub] [spark] srowen commented on pull request #35560: [SPARK-38351][TESTS] Don't use deprecate symbol API in test classes

2022-03-03 Thread GitBox
srowen commented on pull request #35560: URL: https://github.com/apache/spark/pull/35560#issuecomment-1058491957 Is that the same thing? $".." is shorthand for a Column, not Scala Symbol -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] HeartSaVioR commented on pull request #35560: [SPARK-38351][TESTS] Don't use deprecate symbol API in test classes

2022-03-03 Thread GitBox
HeartSaVioR commented on pull request #35560: URL: https://github.com/apache/spark/pull/35560#issuecomment-1058480954 Just to see our preference, given `$"..."` is not mentioned at all for deprecation and it is simpler than `Symbol("...")`, why not consistently using `$"..."`? My intuition

[GitHub] [spark] martin-g commented on a change in pull request #35725: [SPARK-38394][BUILD] Upgrade `scala-maven-plugin` to 4.4.0 for Hadoop 3 profile

2022-03-03 Thread GitBox
martin-g commented on a change in pull request #35725: URL: https://github.com/apache/spark/pull/35725#discussion_r819039329 ## File path: pom.xml ## @@ -163,6 +163,10 @@ 2.12.15 2.12 2.0.2 + + +4.4.0 Review comment: Why not 4.5.6 ? https://se

[GitHub] [spark] srowen commented on pull request #35725: [SPARK-38394][BUILD] Upgrade `scala-maven-plugin` to 4.4.0 for Hadoop 3 profile

2022-03-03 Thread GitBox
srowen commented on pull request #35725: URL: https://github.com/apache/spark/pull/35725#issuecomment-1058442968 Weird, but OK by me -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] sunchao commented on a change in pull request #35725: [SPARK-38394][BUILD] Upgrade `scala-maven-plugin` to 4.4.0 for Hadoop 3 profile

2022-03-03 Thread GitBox
sunchao commented on a change in pull request #35725: URL: https://github.com/apache/spark/pull/35725#discussion_r819028128 ## File path: pom.xml ## @@ -3430,6 +3433,7 @@ hadoop-client hadoop-yarn-api hadoop-client +4.3.0 Review comment:

[GitHub] [spark] dongjoon-hyun commented on pull request #35640: [WIP][SPARK-38187][K8S] Support resource reservation with volcano implementations

2022-03-03 Thread GitBox
dongjoon-hyun commented on pull request #35640: URL: https://github.com/apache/spark/pull/35640#issuecomment-1058422005 Please check the dev mailing list . Ramping down for Apache Spark 3.3 release started, @Yikun . - https://lists.apache.org/thread/ffdk52hkvgsc5ncjh1z0nv120jowtrld --

[GitHub] [spark] dongjoon-hyun commented on pull request #35639: [WIP][SPARK-38189][K8S] Support priority scheduling with Volcano implementations

2022-03-03 Thread GitBox
dongjoon-hyun commented on pull request #35639: URL: https://github.com/apache/spark/pull/35639#issuecomment-1058421664 Please check the dev mailing list . Ramping down for Apache Spark 3.3 release started, @Yikun . - https://lists.apache.org/thread/ffdk52hkvgsc5ncjh1z0nv120jowtrld --

[GitHub] [spark] sunchao commented on pull request #35657: [SPARK-37377][SQL] Initial implementation of Storage-Partitioned Join

2022-03-03 Thread GitBox
sunchao commented on pull request #35657: URL: https://github.com/apache/spark/pull/35657#issuecomment-1058397619 Gently ping @cloud-fan @viirya @dongjoon-hyun @c21 @rdblue @aokolnychyi -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

[GitHub] [spark] dongjoon-hyun commented on pull request #35483: [SPARK-38179][SQL] Improve `WritableColumnVector` to better support null struct

2022-03-03 Thread GitBox
dongjoon-hyun commented on pull request #35483: URL: https://github.com/apache/spark/pull/35483#issuecomment-1058396543 Thank you for the decision, @sunchao . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

[GitHub] [spark] dtenedor commented on pull request #35690: [SPARK-38335][SQL] Implement parser support for DEFAULT column values

2022-03-03 Thread GitBox
dtenedor commented on pull request #35690: URL: https://github.com/apache/spark/pull/35690#issuecomment-1058390966 > > Creating NULL default value for NOT NULL column > > Type mismatch between default value literal and column type. > > Upcasting or not in case of type mismatch >

[GitHub] [spark] sunchao closed pull request #35483: [SPARK-38179][SQL] Improve `WritableColumnVector` to better support null struct

2022-03-03 Thread GitBox
sunchao closed pull request #35483: URL: https://github.com/apache/spark/pull/35483 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubs

[GitHub] [spark] sunchao commented on pull request #35483: [SPARK-38179][SQL] Improve `WritableColumnVector` to better support null struct

2022-03-03 Thread GitBox
sunchao commented on pull request #35483: URL: https://github.com/apache/spark/pull/35483#issuecomment-1058390811 Took a while to get back to this PR: I investigated more on why I needed this change originally from Parquet side, and turned out it's for handling cases like array of structs,

[GitHub] [spark] planga82 commented on pull request #35726: [SPARK-37895][SQL] Filter push down column with quoted columns

2022-03-03 Thread GitBox
planga82 commented on pull request #35726: URL: https://github.com/apache/spark/pull/35726#issuecomment-1058389209 CC @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

[GitHub] [spark] dtenedor commented on pull request #35690: [SPARK-38335][SQL] Implement parser support for DEFAULT column values

2022-03-03 Thread GitBox
dtenedor commented on pull request #35690: URL: https://github.com/apache/spark/pull/35690#issuecomment-1058387995 > This PR doesn't seem to have the full body yet, what is your release target for this, @dtenedor and @gengliangwang ? I'm curious about the general error handling. > >

[GitHub] [spark] planga82 opened a new pull request #35726: [SPARK-37895][SQL] Filter push down column with quoted columns

2022-03-03 Thread GitBox
planga82 opened a new pull request #35726: URL: https://github.com/apache/spark/pull/35726 ### What changes were proposed in this pull request? The problem happens when you have a column that is quoted because it has special characters. ``` select view1.`Имя1` , vie

[GitHub] [spark] dtenedor commented on pull request #35690: [SPARK-38335][SQL] Implement parser support for DEFAULT column values

2022-03-03 Thread GitBox
dtenedor commented on pull request #35690: URL: https://github.com/apache/spark/pull/35690#issuecomment-1058385738 > Thanks for the work. > > For the parser change itself, looks okay. As this is a breaking change, I'd like to see some clarification on why this is necessary to have. W

[GitHub] [spark] dtenedor commented on a change in pull request #35690: [SPARK-38335][SQL] Implement parser support for DEFAULT column values

2022-03-03 Thread GitBox
dtenedor commented on a change in pull request #35690: URL: https://github.com/apache/spark/pull/35690#discussion_r818973943 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala ## @@ -2235,4 +2236,57 @@ class DDLParserSuite exten

[GitHub] [spark] mridulm commented on a change in pull request #35719: [SPARK-38401][SQL][CORE] Unify get preferred locations for shuffle in AQE

2022-03-03 Thread GitBox
mridulm commented on a change in pull request #35719: URL: https://github.com/apache/spark/pull/35719#discussion_r818967642 ## File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala ## @@ -1022,101 +1022,105 @@ private[spark] class MapOutputTrackerMaster(

[GitHub] [spark] mridulm commented on a change in pull request #35719: [SPARK-38401][SQL][CORE] Unify get preferred locations for shuffle in AQE

2022-03-03 Thread GitBox
mridulm commented on a change in pull request #35719: URL: https://github.com/apache/spark/pull/35719#discussion_r818966635 ## File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala ## @@ -1022,101 +1022,105 @@ private[spark] class MapOutputTrackerMaster(

[GitHub] [spark] ueshin commented on a change in pull request #35706: [SPARK-38387][PYTHON] Support `na_action` and Series input correspondence in `Series.map`

2022-03-03 Thread GitBox
ueshin commented on a change in pull request #35706: URL: https://github.com/apache/spark/pull/35706#discussion_r818958752 ## File path: python/pyspark/pandas/series.py ## @@ -1045,8 +1048,18 @@ def map(self, arg: Union[Dict, Callable]) -> "Series": 2 I am a None

[GitHub] [spark] dongjoon-hyun closed pull request #35720: [SPARK-33206][CORE][3.1] Fix shuffle index cache weight calculation for small index files

2022-03-03 Thread GitBox
dongjoon-hyun closed pull request #35720: URL: https://github.com/apache/spark/pull/35720 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-

[GitHub] [spark] attilapiros commented on pull request #35723: [SPARK-33206][CORE][3.0] Fix shuffle index cache weight calculation for small index files

2022-03-03 Thread GitBox
attilapiros commented on pull request #35723: URL: https://github.com/apache/spark/pull/35723#issuecomment-1058355951 The failure in `BasicSchedulerIntegrationSuite` must be a know flaky test: https://issues.apache.org/jira/browse/SPARK-36375. -- This is an automated message from

[GitHub] [spark] steveloughran opened a new pull request #35725: SPARK-38394. Build of spark sql against hadoop-3.4.0-snapshot failing.

2022-03-03 Thread GitBox
steveloughran opened a new pull request #35725: URL: https://github.com/apache/spark/pull/35725 ### What changes were proposed in this pull request? This sets scala-maven-plugin.version to 4.4.0 except when the hadoop-2.7 profile is used, because SPARK-36547 shows that only

[GitHub] [spark] mridulm commented on pull request #35714: [SPARK-33206][CORE][3.2] Fix shuffle index cache weight calculation for small index files

2022-03-03 Thread GitBox
mridulm commented on pull request #35714: URL: https://github.com/apache/spark/pull/35714#issuecomment-1058353273 Late LGTM, thanks @attilapiros ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] dtenedor commented on a change in pull request #35690: [SPARK-38335][SQL] Implement parser support for DEFAULT column values

2022-03-03 Thread GitBox
dtenedor commented on a change in pull request #35690: URL: https://github.com/apache/spark/pull/35690#discussion_r818922112 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -2756,6 +2763,41 @@ class AstBuilder extends SqlBa

[GitHub] [spark] dtenedor commented on a change in pull request #35690: [SPARK-38335][SQL] Implement parser support for DEFAULT column values

2022-03-03 Thread GitBox
dtenedor commented on a change in pull request #35690: URL: https://github.com/apache/spark/pull/35690#discussion_r818919712 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -3651,12 +3695,20 @@ class AstBuilder extends SqlB

[GitHub] [spark] dtenedor commented on a change in pull request #35690: [SPARK-38335][SQL] Implement parser support for DEFAULT column values

2022-03-03 Thread GitBox
dtenedor commented on a change in pull request #35690: URL: https://github.com/apache/spark/pull/35690#discussion_r818919421 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -3813,6 +3871,9 @@ class AstBuilder extends SqlBas

[GitHub] [spark] dtenedor commented on a change in pull request #35690: [SPARK-38335][SQL] Implement parser support for DEFAULT column values

2022-03-03 Thread GitBox
dtenedor commented on a change in pull request #35690: URL: https://github.com/apache/spark/pull/35690#discussion_r818916692 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -2756,6 +2763,41 @@ class AstBuilder extends SqlBa

[GitHub] [spark] heyihong commented on pull request #35687: [SPARK-38353][PYTHON] Instrument __enter__ and __exit__ magic methods for Pandas API on Spark

2022-03-03 Thread GitBox
heyihong commented on pull request #35687: URL: https://github.com/apache/spark/pull/35687#issuecomment-1058300837 > Emm, PR's description is really unfriendly for me. lol, after take a deep look, it enables usage logger in Pandas on Spark for ContextManager usage, record in `__enter__` an

[GitHub] [spark] dongjoon-hyun commented on pull request #35720: [SPARK-33206][CORE][3.1] Fix shuffle index cache weight calculation for small index files

2022-03-03 Thread GitBox
dongjoon-hyun commented on pull request #35720: URL: https://github.com/apache/spark/pull/35720#issuecomment-1058297847 Thank you, @attilapiros and @HyukjinKwon . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[GitHub] [spark] stczwd commented on a change in pull request #35669: [SPARK-38041][SQL]DataFilter pushed down with PartitionFilter

2022-03-03 Thread GitBox
stczwd commented on a change in pull request #35669: URL: https://github.com/apache/spark/pull/35669#discussion_r818881906 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala ## @@ -557,6 +562,7 @@ class ParquetFilters(

[GitHub] [spark] huaxingao commented on a change in pull request #35691: [SPARK-38357][SQL][3.2] Fix StackOverflowError with OR(data filter, partition filter)

2022-03-03 Thread GitBox
huaxingao commented on a change in pull request #35691: URL: https://github.com/apache/spark/pull/35691#discussion_r818864088 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PruneFileSourcePartitions.scala ## @@ -83,7 +89,7 @@ private[sql] obje

[GitHub] [spark] cloud-fan commented on a change in pull request #35719: [SPARK-38401][SQL][CORE] Unify get preferred locations for shuffle in AQE

2022-03-03 Thread GitBox
cloud-fan commented on a change in pull request #35719: URL: https://github.com/apache/spark/pull/35719#discussion_r818798463 ## File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala ## @@ -1022,101 +1022,105 @@ private[spark] class MapOutputTrackerMaster(

[GitHub] [spark] gengliangwang commented on pull request #35724: [SPARK-38407][SQL] ANSI Cast: loosen the limitation of casting non-null complex types

2022-03-03 Thread GitBox
gengliangwang commented on pull request #35724: URL: https://github.com/apache/spark/pull/35724#issuecomment-1058186343 cc @entong as well -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [spark] gengliangwang opened a new pull request #35724: [SPARK-38407][SQL] ANSI Cast: loosen the limitation of casting non-null complex types

2022-03-03 Thread GitBox
gengliangwang opened a new pull request #35724: URL: https://github.com/apache/spark/pull/35724 ### What changes were proposed in this pull request? When ANSI mode is off, `ArrayType(DoubleType, containsNull = false)` can't cast as `ArrayType(IntegerType, containsNull = false

  1   2   >