Github user laserson commented on the pull request:
https://github.com/apache/incubator-spark/pull/576#issuecomment-35040314
Yes, that's a much better suggestion. Thanks!
Github user laserson closed the pull request at:
https://github.com/apache/incubator-spark/pull/576
Github user laserson commented on the pull request:
https://github.com/apache/incubator-spark/pull/576#issuecomment-35035595
Yes, I have since thought about it more and agree that this would actually
be a bad idea. No need to add additional dependencies on other specific file
Github user laserson commented on the pull request:
https://github.com/apache/incubator-spark/pull/576#issuecomment-34718389
No, this actually constructs Avro `GenericRecord` objects in memory. The
problem is that if you want access to the Parquet data through PySpark, there
is no
GitHub user laserson opened a pull request:
https://github.com/apache/incubator-spark/pull/576
Added parquetFileAsJSON to read Parquet data into JSON strings
This function makes it incredibly easy to read Parquet data especially with
PySpark. Is there any interest in this? It
m happy to
contribute these, but want to hear what the preferred method is first.
Uri
--
Uri Laserson, PhD
Data Scientist, Cloudera
Twitter/GitHub: @laserson
+1 617 910 0447
laser...@cloudera.com