sunchao commented on pull request #29542:
URL: https://github.com/apache/spark/pull/29542#issuecomment-786837732
Opened #31667 with the fix. Comparing to the old PR there's only one line
change:
```scala
reader.setRequestedSchema(requestedSchema);
```
Please take a look. Thanks!
sunchao commented on pull request #29542:
URL: https://github.com/apache/spark/pull/29542#issuecomment-786443278
I tried @LuciferYang 's suggestion with TPC-DS benchmark and it fixed the
perf regression. For instance, in the q9 above, here's what I got:
without the PR | with the PR |
sunchao commented on pull request #29542:
URL: https://github.com/apache/spark/pull/29542#issuecomment-784706735
Interesting. I'll take a look on the issue. Thanks for reporting.
This is an automated message from the Apache
sunchao commented on pull request #29542:
URL: https://github.com/apache/spark/pull/29542#issuecomment-781504812
Thanks all for reviewing and merging!
This is an automated message from the Apache Git Service.
To respond to th
sunchao commented on pull request #29542:
URL: https://github.com/apache/spark/pull/29542#issuecomment-769938181
Re-open this since Spark has upgraded to Parquet 1.11 :-)
This is an automated message from the Apache Git Servi
sunchao commented on pull request #29542:
URL: https://github.com/apache/spark/pull/29542#issuecomment-683655834
Thanks again @HyukjinKwon . Closing this for now.
This is an automated message from the Apache Git Service.
To r
sunchao commented on pull request #29542:
URL: https://github.com/apache/spark/pull/29542#issuecomment-683635947
Thanks @HyukjinKwon - this is good to know. Since it seems Spark could take
a while to get to 1.11, should I close this for now then?
-
sunchao commented on pull request #29542:
URL: https://github.com/apache/spark/pull/29542#issuecomment-683451979
> Parquet reader is performance-wise important component in Spark SQL. We
better to make sure no performance regression due to this change. Should we run
a benchmark to check it
sunchao commented on pull request #29542:
URL: https://github.com/apache/spark/pull/29542#issuecomment-683210745
> Parquet reader is performance-wise important component in Spark SQL. We
better to make sure no performance regression due to this change. Should we run
a benchmark to check it
sunchao commented on pull request #29542:
URL: https://github.com/apache/spark/pull/29542#issuecomment-682996358
@HyukjinKwon this PR is ready - can you please take another look? Thanks!
This is an automated message from the
10 matches
Mail list logo