[GitHub] [spark] HyukjinKwon commented on pull request #31858: [SPARK-34768][SQL] Respect the default input buffer size in Univocity

2021-03-21 Thread GitBox
HyukjinKwon commented on pull request #31858: URL: https://github.com/apache/spark/pull/31858#issuecomment-803736372 I performed the benchmark before after this commit, and I do see the perf improvement here. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] HyukjinKwon commented on pull request #31858: [SPARK-34768][SQL] Respect the default input buffer size in Univocity

2021-03-21 Thread GitBox
HyukjinKwon commented on pull request #31858: URL: https://github.com/apache/spark/pull/31858#issuecomment-803736270 ```diff Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz Parsing quoted values:Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns)

[GitHub] [spark] HyukjinKwon commented on pull request #31858: [SPARK-34768][SQL] Respect the default input buffer size in Univocity

2021-03-17 Thread GitBox
HyukjinKwon commented on pull request #31858: URL: https://github.com/apache/spark/pull/31858#issuecomment-800988031 Thanks, Max. Merged to master, branch-3.1 and branch-3.0. This is an automated message from the Apache Git

[GitHub] [spark] HyukjinKwon commented on pull request #31858: [SPARK-34768][SQL] Respect the default input buffer size in Univocity

2021-03-17 Thread GitBox
HyukjinKwon commented on pull request #31858: URL: https://github.com/apache/spark/pull/31858#issuecomment-800873420 > It would be nice to re-run CSV benchmarks. the fix will have to be ported back through branch-3.1. I would do it separately.

[GitHub] [spark] HyukjinKwon commented on pull request #31858: [SPARK-34768][SQL] Respect the default input buffer size in Univocity

2021-03-17 Thread GitBox
HyukjinKwon commented on pull request #31858: URL: https://github.com/apache/spark/pull/31858#issuecomment-800872570 Yeah, looks like this doesn't exist in Spark 2.4 according to our internal report. It does fix the specific case by increasing the limit. It's just a bandaid fix.

[GitHub] [spark] HyukjinKwon commented on pull request #31858: [SPARK-34768][SQL] Respect the default input buffer size in Univocity

2021-03-17 Thread GitBox
HyukjinKwon commented on pull request #31858: URL: https://github.com/apache/spark/pull/31858#issuecomment-800865318 Thanks man. Yeah, this bandaids the issue (rather as its side effect). I believe it's better to use default buffer size for stability, potentially better performance, etc.

[GitHub] [spark] HyukjinKwon commented on pull request #31858: [SPARK-34768 ][SQL] Respect the default input buffer size in Univocity

2021-03-16 Thread GitBox
HyukjinKwon commented on pull request #31858: URL: https://github.com/apache/spark/pull/31858#issuecomment-800794256 cc @MaxGekk can you take a look please? This is an automated message from the Apache Git Service. To