[GitHub] incubator-spark pull request: Added parquetFileAsJSON to read Parq...

2014-02-13 Thread laserson
Github user laserson commented on the pull request: https://github.com/apache/incubator-spark/pull/576#issuecomment-35040314 Yes, that's a much better suggestion. Thanks!

[GitHub] incubator-spark pull request: Added parquetFileAsJSON to read Parq...

2014-02-13 Thread laserson
Github user laserson closed the pull request at: https://github.com/apache/incubator-spark/pull/576

[GitHub] incubator-spark pull request: Added parquetFileAsJSON to read Parq...

2014-02-13 Thread laserson
Github user laserson commented on the pull request: https://github.com/apache/incubator-spark/pull/576#issuecomment-35035595 Yes, I have since thought about it more and agree that this would actually be a bad idea. No need to add additional dependencies on other specific file

[GitHub] incubator-spark pull request: Added parquetFileAsJSON to read Parq...

2014-02-10 Thread laserson
Github user laserson commented on the pull request: https://github.com/apache/incubator-spark/pull/576#issuecomment-34718389 No, this actually constructs Avro `GenericRecord` objects in memory. The problem is that if you want access to the Parquet data through PySpark, there is no

[GitHub] incubator-spark pull request: Added parquetFileAsJSON to read Parq...

2014-02-10 Thread laserson
GitHub user laserson opened a pull request: https://github.com/apache/incubator-spark/pull/576 Added parquetFileAsJSON to read Parquet data into JSON strings This function makes it incredibly easy to read Parquet data especially with PySpark. Is there any interest in this? It

Installing PySpark on a local machine

2013-12-22 Thread Uri Laserson
m happy to contribute these, but want to hear what the preferred method is first. Uri -- Uri Laserson, PhD Data Scientist, Cloudera Twitter/GitHub: @laserson +1 617 910 0447 laser...@cloudera.com