[GitHub] [spark] sunchao commented on pull request #29542: [SPARK-32703][SQL] Replace deprecated API calls from SpecificParquetRecordReaderBase

2021-02-26 Thread GitBox
sunchao commented on pull request #29542: URL: https://github.com/apache/spark/pull/29542#issuecomment-786837732 Opened #31667 with the fix. Comparing to the old PR there's only one line change: ```scala reader.setRequestedSchema(requestedSchema); ``` Please take a look. Thanks!

[GitHub] [spark] sunchao commented on pull request #29542: [SPARK-32703][SQL] Replace deprecated API calls from SpecificParquetRecordReaderBase

2021-02-25 Thread GitBox
sunchao commented on pull request #29542: URL: https://github.com/apache/spark/pull/29542#issuecomment-786443278 I tried @LuciferYang 's suggestion with TPC-DS benchmark and it fixed the perf regression. For instance, in the q9 above, here's what I got: without the PR | with the PR |

[GitHub] [spark] sunchao commented on pull request #29542: [SPARK-32703][SQL] Replace deprecated API calls from SpecificParquetRecordReaderBase

2021-02-23 Thread GitBox
sunchao commented on pull request #29542: URL: https://github.com/apache/spark/pull/29542#issuecomment-784706735 Interesting. I'll take a look on the issue. Thanks for reporting. This is an automated message from the Apache

[GitHub] [spark] sunchao commented on pull request #29542: [SPARK-32703][SQL] Replace deprecated API calls from SpecificParquetRecordReaderBase

2021-02-18 Thread GitBox
sunchao commented on pull request #29542: URL: https://github.com/apache/spark/pull/29542#issuecomment-781504812 Thanks all for reviewing and merging! This is an automated message from the Apache Git Service. To respond to th

[GitHub] [spark] sunchao commented on pull request #29542: [SPARK-32703][SQL] Replace deprecated API calls from SpecificParquetRecordReaderBase

2021-01-29 Thread GitBox
sunchao commented on pull request #29542: URL: https://github.com/apache/spark/pull/29542#issuecomment-769938181 Re-open this since Spark has upgraded to Parquet 1.11 :-) This is an automated message from the Apache Git Servi

[GitHub] [spark] sunchao commented on pull request #29542: [SPARK-32703][SQL] Replace deprecated API calls from SpecificParquetRecordReaderBase

2020-08-31 Thread GitBox
sunchao commented on pull request #29542: URL: https://github.com/apache/spark/pull/29542#issuecomment-683655834 Thanks again @HyukjinKwon . Closing this for now. This is an automated message from the Apache Git Service. To r

[GitHub] [spark] sunchao commented on pull request #29542: [SPARK-32703][SQL] Replace deprecated API calls from SpecificParquetRecordReaderBase

2020-08-31 Thread GitBox
sunchao commented on pull request #29542: URL: https://github.com/apache/spark/pull/29542#issuecomment-683635947 Thanks @HyukjinKwon - this is good to know. Since it seems Spark could take a while to get to 1.11, should I close this for now then? -

[GitHub] [spark] sunchao commented on pull request #29542: [SPARK-32703][SQL] Replace deprecated API calls from SpecificParquetRecordReaderBase

2020-08-30 Thread GitBox
sunchao commented on pull request #29542: URL: https://github.com/apache/spark/pull/29542#issuecomment-683451979 > Parquet reader is performance-wise important component in Spark SQL. We better to make sure no performance regression due to this change. Should we run a benchmark to check it

[GitHub] [spark] sunchao commented on pull request #29542: [SPARK-32703][SQL] Replace deprecated API calls from SpecificParquetRecordReaderBase

2020-08-28 Thread GitBox
sunchao commented on pull request #29542: URL: https://github.com/apache/spark/pull/29542#issuecomment-683210745 > Parquet reader is performance-wise important component in Spark SQL. We better to make sure no performance regression due to this change. Should we run a benchmark to check it

[GitHub] [spark] sunchao commented on pull request #29542: [SPARK-32703][SQL] Replace deprecated API calls from SpecificParquetRecordReaderBase

2020-08-28 Thread GitBox
sunchao commented on pull request #29542: URL: https://github.com/apache/spark/pull/29542#issuecomment-682996358 @HyukjinKwon this PR is ready - can you please take another look? Thanks! This is an automated message from the