HyukjinKwon commented on pull request #31858:
URL: https://github.com/apache/spark/pull/31858#issuecomment-803736372
I performed the benchmark before after this commit, and I do see the perf
improvement here.
--
This is an automated message from the Apache Git Service.
To respond to the
HyukjinKwon commented on pull request #31858:
URL: https://github.com/apache/spark/pull/31858#issuecomment-803736270
```diff
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
Parsing quoted values:Best Time(ms) Avg Time(ms)
Stdev(ms)Rate(M/s) Per Row(ns)
HyukjinKwon commented on pull request #31858:
URL: https://github.com/apache/spark/pull/31858#issuecomment-800988031
Thanks, Max. Merged to master, branch-3.1 and branch-3.0.
This is an automated message from the Apache Git
HyukjinKwon commented on pull request #31858:
URL: https://github.com/apache/spark/pull/31858#issuecomment-800873420
> It would be nice to re-run CSV benchmarks.
the fix will have to be ported back through branch-3.1. I would do it
separately.
HyukjinKwon commented on pull request #31858:
URL: https://github.com/apache/spark/pull/31858#issuecomment-800872570
Yeah, looks like this doesn't exist in Spark 2.4 according to our internal
report. It does fix the specific case by increasing the limit. It's just a
bandaid fix.
HyukjinKwon commented on pull request #31858:
URL: https://github.com/apache/spark/pull/31858#issuecomment-800865318
Thanks man. Yeah, this bandaids the issue (rather as its side effect). I
believe it's better to use default buffer size for stability, potentially
better performance, etc.
HyukjinKwon commented on pull request #31858:
URL: https://github.com/apache/spark/pull/31858#issuecomment-800794256
cc @MaxGekk can you take a look please?
This is an automated message from the Apache Git Service.
To