[GitHub] spark issue #18005: [SPARK-20773][SQL] ParquetWriteSupport.writeFields is qu...

2017-05-19 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/18005 LGTM - merging to master/2.2. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18005: [SPARK-20773][SQL] ParquetWriteSupport.writeFields is qu...

2017-05-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18005 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18005: [SPARK-20773][SQL] ParquetWriteSupport.writeFields is qu...

2017-05-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18005 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77018/ Test PASSed. ---

[GitHub] spark issue #18005: [SPARK-20773][SQL] ParquetWriteSupport.writeFields is qu...

2017-05-17 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18005 **[Test build #77018 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77018/testReport)** for PR 18005 at commit

[GitHub] spark issue #18005: [SPARK-20773][SQL] ParquetWriteSupport.writeFields is qu...

2017-05-17 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18005 **[Test build #77018 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77018/testReport)** for PR 18005 at commit

[GitHub] spark issue #18005: [SPARK-20773][SQL] ParquetWriteSupport.writeFields is qu...

2017-05-17 Thread tpoterba
Github user tpoterba commented on the issue: https://github.com/apache/spark/pull/18005 Others on my team suggest that the >64k bytecode issue has been fixed already (and ported to a 2.1 maintenance release as well) --- If your project is set up for it, you can reply to this email

[GitHub] spark issue #18005: [SPARK-20773][SQL] ParquetWriteSupport.writeFields is qu...

2017-05-17 Thread tpoterba
Github user tpoterba commented on the issue: https://github.com/apache/spark/pull/18005 I used this script to generate random CSV files: ```python import uuid import sys try: print('args = ' + str(sys.argv)) filename = sys.argv[1] cols =

[GitHub] spark issue #18005: [SPARK-20773][SQL] ParquetWriteSupport.writeFields is qu...

2017-05-17 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/18005 What do you mean by catalyst blew up? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18005: [SPARK-20773][SQL] ParquetWriteSupport.writeFields is qu...

2017-05-17 Thread tpoterba
Github user tpoterba commented on the issue: https://github.com/apache/spark/pull/18005 Addressed comments. I tried to get some benchmark stats for this code: ```python spark.read.csv(text_file).write.mode('overwrite').parquet(parquet_path) ``` I wanted to

[GitHub] spark issue #18005: [SPARK-20773][SQL] ParquetWriteSupport.writeFields is qu...

2017-05-17 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/18005 LGTM pending jenkins --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #18005: [SPARK-20773][SQL] ParquetWriteSupport.writeFields is qu...

2017-05-16 Thread tpoterba
Github user tpoterba commented on the issue: https://github.com/apache/spark/pull/18005 Yeah, I can change that - I do hate the standard IndexedSeq implementation (Vector) though, and want to make sure that the collection is actually a WrappedArray. I've actually done more

[GitHub] spark issue #18005: [SPARK-20773][SQL] ParquetWriteSupport.writeFields is qu...

2017-05-16 Thread JoshRosen
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/18005 It might be nice to explicitly use the type `IndexedSeq[ValueWriter]` for `rootFieldWriters` (up on line 61 of this file) since that would capture the intent behind using an Array and would maybe

[GitHub] spark issue #18005: [SPARK-20773][SQL] ParquetWriteSupport.writeFields is qu...

2017-05-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18005 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18005: [SPARK-20773][SQL] ParquetWriteSupport.writeFields is qu...

2017-05-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18005 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76983/ Test FAILed. ---

[GitHub] spark issue #18005: [SPARK-20773][SQL] ParquetWriteSupport.writeFields is qu...

2017-05-16 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18005 **[Test build #76983 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76983/testReport)** for PR 18005 at commit

[GitHub] spark issue #18005: [SPARK-20773][SQL] ParquetWriteSupport.writeFields is qu...

2017-05-16 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18005 **[Test build #76983 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76983/testReport)** for PR 18005 at commit

[GitHub] spark issue #18005: [SPARK-20773][SQL] ParquetWriteSupport.writeFields is qu...

2017-05-16 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/18005 Can you also make sure that we do not use a `Seq` for struct writing? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #18005: [SPARK-20773][SQL] ParquetWriteSupport.writeFields is qu...

2017-05-16 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/18005 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if

[GitHub] spark issue #18005: [SPARK-20773][SQL] ParquetWriteSupport.writeFields is qu...

2017-05-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18005 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this