[GitHub] spark issue #21952: [SPARK-24993] [SQL] Make Avro Fast Again

2018-08-07 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21952 do we have the same regression for parquet? wondering if the regression comes from the `FileFormat` framework. --- - To

[GitHub] spark issue #21952: [SPARK-24993] [SQL] Make Avro Fast Again

2018-08-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21952 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21952: [SPARK-24993] [SQL] Make Avro Fast Again

2018-08-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21952 **[Test build #94117 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94117/testReport)** for PR 21952 at commit

[GitHub] spark issue #21952: [SPARK-24993] [SQL] Make Avro Fast Again

2018-08-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21952 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94117/ Test PASSed. ---

[GitHub] spark issue #21952: [SPARK-24993] [SQL] Make Avro Fast Again

2018-08-03 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/21952 Merged into master. Thanks all for reviewing. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21952: [SPARK-24993] [SQL] Make Avro Fast Again

2018-08-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21952 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark issue #21952: [SPARK-24993] [SQL] Make Avro Fast Again

2018-08-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21952 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21952: [SPARK-24993] [SQL] Make Avro Fast Again

2018-08-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21952 **[Test build #94117 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94117/testReport)** for PR 21952 at commit

[GitHub] spark issue #21952: [SPARK-24993] [SQL] Make Avro Fast Again

2018-08-03 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21952 The regression happens at writing. Looks like when benchmarking writing time, we don't use `df.count`? --- - To unsubscribe,

[GitHub] spark issue #21952: [SPARK-24993] [SQL] Make Avro Fast Again

2018-08-03 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21952 I noticed that the benchmark uses `df.count`, is it possible that column pruning has some issues in master? --- - To

[GitHub] spark issue #21952: [SPARK-24993] [SQL] Make Avro Fast Again

2018-08-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21952 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94104/ Test PASSed. ---

[GitHub] spark issue #21952: [SPARK-24993] [SQL] Make Avro Fast Again

2018-08-03 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21952 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #21952: [SPARK-24993] [SQL] Make Avro Fast Again

2018-08-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21952 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21952: [SPARK-24993] [SQL] Make Avro Fast Again

2018-08-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21952 **[Test build #94104 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94104/testReport)** for PR 21952 at commit

[GitHub] spark issue #21952: [SPARK-24993] [SQL] Make Avro Fast Again

2018-08-03 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21952 Ah, finally I can reproduce this. It needs to allocate the array feature with length 16000. I was reducing it to 1600 and it largely relieve the regression. `com.databricks.spark.avro` is faster

[GitHub] spark issue #21952: [SPARK-24993] [SQL] Make Avro Fast Again

2018-08-02 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21952 we can keep investigating the perf regression, this patch itself LGTM --- - To unsubscribe, e-mail:

[GitHub] spark issue #21952: [SPARK-24993] [SQL] Make Avro Fast Again

2018-08-02 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21952 **[Test build #94104 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94104/testReport)** for PR 21952 at commit

[GitHub] spark issue #21952: [SPARK-24993] [SQL] Make Avro Fast Again

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21952 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21952: [SPARK-24993] [SQL] Make Avro Fast Again

2018-08-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21952 Test PASSed. Refer to this link for build results (access rights to CI server needed):