Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21892
I opened the separate PR for switching on **2.7.3**. Please, take a look at
#21969
---
-
To unsubscribe, e-mail:
Github user maropu commented on the issue:
https://github.com/apache/spark/pull/21892
Also, can you update the description?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands,
Github user jbax commented on the issue:
https://github.com/apache/spark/pull/21892
univocity-parsers-2.7.3 released. Thanks!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/21892
Great! Let us wait for 2.7.3 build? @jbax When will it be released?
---
-
To unsubscribe, e-mail:
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21892
@jbax It became really faster:
```
Parsing quoted values: Best/Avg Time(ms)Rate(M/s)
Per Row(ns) Relative
Github user jbax commented on the issue:
https://github.com/apache/spark/pull/21892
Thanks @MaxGekk I've fixed the error and also made the parser run faster
than before when processing fields that were not selected in general.
Can you please retest with the latest SNAPSHOT
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21892
@jbax I got the following exception on **2.7.3-SNAPSHOT** (commit
e51b0958a):
```
Internal state when error was thrown: line=20, column=20481, record=20,
charIndex=82594, headers=[col0,...,
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/21892
@jbax Thanks for the info!
ping @MaxGekk @HyukjinKwon
---
-
To unsubscribe, e-mail:
Github user jbax commented on the issue:
https://github.com/apache/spark/pull/21892
Did anyone had a chance to test with the 2.7.3-SNAPSHOT build I released to
see if the performance issue has been addressed? If it has then let me know
and I'll release the final 2.7.3 build.
---
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/21892
sounds good to me.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21892
> we can get 3.5 times perf improvement for `count()`.
We shouldn't merge this as is for clarification. it needs a close look
after #21909 which avoids the parsing code path by
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/21892
@HyukjinKwon I would suggest to skip this upgrade and then we can get 3.5
times perf improvement for `count()` .
---
-
To
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21892
Yea, we should run. 0.2% - 8% can be made by different environment, etc
given my past running benchmarks. I am not saying we should merge this now but
seems fine because the big perf diff will
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/21892
@HyukjinKwon We need to rerun the perf tests after
https://github.com/apache/spark/pull/21909 is merged.
We are also unable to accept the perf regression larger than `5%`. Based on
the
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21892
Ah, that looks going to be addressed in
https://github.com/apache/spark/pull/21909 if you refer the number for
`count()` which's not going to execute the parsing path that this upgrade
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/21892
@MaxGekk @HyukjinKwon We are unable to merge this PR since the performance
regression is very obvious.
---
-
To
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21892
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93665/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21892
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21892
**[Test build #93665 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93665/testReport)**
for PR 21892 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21892
**[Test build #93665 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93665/testReport)**
for PR 21892 at commit
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21892
@HyukjinKwon @maropu Please, take a look at the PR. Is it valid to just
count number of lines returned by Hadoop LineReader and do not call parser at
all? Maybe there are some corner cases when
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21892
Can one of the admins verify this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21892
Can one of the admins verify this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
23 matches
Mail list logo