[GitHub] spark issue #21892: [SPARK-24945][SQL] Switching to uniVocity 2.7.2

2018-08-02 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21892 I opened the separate PR for switching on **2.7.3**. Please, take a look at #21969 --- - To unsubscribe, e-mail:

[GitHub] spark issue #21892: [SPARK-24945][SQL] Switching to uniVocity 2.7.2

2018-08-02 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21892 Also, can you update the description? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,

[GitHub] spark issue #21892: [SPARK-24945][SQL] Switching to uniVocity 2.7.2

2018-08-02 Thread jbax
Github user jbax commented on the issue: https://github.com/apache/spark/pull/21892 univocity-parsers-2.7.3 released. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21892: [SPARK-24945][SQL] Switching to uniVocity 2.7.2

2018-08-01 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21892 Great! Let us wait for 2.7.3 build? @jbax When will it be released? --- - To unsubscribe, e-mail:

[GitHub] spark issue #21892: [SPARK-24945][SQL] Switching to uniVocity 2.7.2

2018-08-01 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21892 @jbax It became really faster: ``` Parsing quoted values: Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative

[GitHub] spark issue #21892: [SPARK-24945][SQL] Switching to uniVocity 2.7.2

2018-08-01 Thread jbax
Github user jbax commented on the issue: https://github.com/apache/spark/pull/21892 Thanks @MaxGekk I've fixed the error and also made the parser run faster than before when processing fields that were not selected in general. Can you please retest with the latest SNAPSHOT

[GitHub] spark issue #21892: [SPARK-24945][SQL] Switching to uniVocity 2.7.2

2018-07-31 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21892 @jbax I got the following exception on **2.7.3-SNAPSHOT** (commit e51b0958a): ``` Internal state when error was thrown: line=20, column=20481, record=20, charIndex=82594, headers=[col0,...,

[GitHub] spark issue #21892: [SPARK-24945][SQL] Switching to uniVocity 2.7.2

2018-07-31 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21892 @jbax Thanks for the info! ping @MaxGekk @HyukjinKwon --- - To unsubscribe, e-mail:

[GitHub] spark issue #21892: [SPARK-24945][SQL] Switching to uniVocity 2.7.2

2018-07-31 Thread jbax
Github user jbax commented on the issue: https://github.com/apache/spark/pull/21892 Did anyone had a chance to test with the 2.7.3-SNAPSHOT build I released to see if the performance issue has been addressed? If it has then let me know and I'll release the final 2.7.3 build. ---

[GitHub] spark issue #21892: [SPARK-24945][SQL] Switching to uniVocity 2.7.2

2018-07-30 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21892 sounds good to me. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #21892: [SPARK-24945][SQL] Switching to uniVocity 2.7.2

2018-07-30 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21892 > we can get 3.5 times perf improvement for `count()`. We shouldn't merge this as is for clarification. it needs a close look after #21909 which avoids the parsing code path by

[GitHub] spark issue #21892: [SPARK-24945][SQL] Switching to uniVocity 2.7.2

2018-07-30 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21892 @HyukjinKwon I would suggest to skip this upgrade and then we can get 3.5 times perf improvement for `count()` . --- - To

[GitHub] spark issue #21892: [SPARK-24945][SQL] Switching to uniVocity 2.7.2

2018-07-30 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21892 Yea, we should run. 0.2% - 8% can be made by different environment, etc given my past running benchmarks. I am not saying we should merge this now but seems fine because the big perf diff will

[GitHub] spark issue #21892: [SPARK-24945][SQL] Switching to uniVocity 2.7.2

2018-07-30 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21892 @HyukjinKwon We need to rerun the perf tests after https://github.com/apache/spark/pull/21909 is merged. We are also unable to accept the perf regression larger than `5%`. Based on the

[GitHub] spark issue #21892: [SPARK-24945][SQL] Switching to uniVocity 2.7.2

2018-07-30 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21892 Ah, that looks going to be addressed in https://github.com/apache/spark/pull/21909 if you refer the number for `count()` which's not going to execute the parsing path that this upgrade

[GitHub] spark issue #21892: [SPARK-24945][SQL] Switching to uniVocity 2.7.2

2018-07-30 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21892 @MaxGekk @HyukjinKwon We are unable to merge this PR since the performance regression is very obvious. --- - To

[GitHub] spark issue #21892: [SPARK-24945][SQL] Switching to uniVocity 2.7.2

2018-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21892 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93665/ Test PASSed. ---

[GitHub] spark issue #21892: [SPARK-24945][SQL] Switching to uniVocity 2.7.2

2018-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21892 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21892: [SPARK-24945][SQL] Switching to uniVocity 2.7.2

2018-07-27 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21892 **[Test build #93665 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93665/testReport)** for PR 21892 at commit

[GitHub] spark issue #21892: [SPARK-24945][SQL] Switching to uniVocity 2.7.2

2018-07-27 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21892 **[Test build #93665 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93665/testReport)** for PR 21892 at commit

[GitHub] spark issue #21892: [SPARK-24945][SQL] Switching to uniVocity 2.7.2

2018-07-27 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21892 @HyukjinKwon @maropu Please, take a look at the PR. Is it valid to just count number of lines returned by Hadoop LineReader and do not call parser at all? Maybe there are some corner cases when

[GitHub] spark issue #21892: [SPARK-24945][SQL] Switching to uniVocity 2.7.2

2018-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21892 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21892: [SPARK-24945][SQL] Switching to uniVocity 2.7.2

2018-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21892 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional