[GitHub] spark pull request: SPARK-1416: PySpark support for SequenceFile a...

2014-08-05 Thread rjurney
Github user rjurney commented on the pull request: https://github.com/apache/spark/pull/455#issuecomment-51202880 @ericgarcia @srowen @MLnick Unfortunately when I follow those directions, I still get errors. It looks like I'll have to wait to get this functionality unti

[GitHub] spark pull request: SPARK-1416: PySpark support for SequenceFile a...

2014-07-29 Thread rjurney
Github user rjurney commented on the pull request: https://github.com/apache/spark/pull/455#issuecomment-50515663 @ericgarcia Could you please create a public branch with this code in a working state and push it to your clone of spark so I can use that? I'm bad at merging conf

[GitHub] spark pull request: SPARK-1416: PySpark support for SequenceFile a...

2014-07-29 Thread rjurney
Github user rjurney commented on the pull request: https://github.com/apache/spark/pull/455#issuecomment-50514645 @ericgarcia @srowen Sorry, but again I can't make things go. I try to pull that request to branch-1.0.0 via: 'git fetch origin pull/455/head:master'

[GitHub] spark pull request: SPARK-1416: PySpark support for SequenceFile a...

2014-07-28 Thread rjurney
Github user rjurney commented on the pull request: https://github.com/apache/spark/pull/455#issuecomment-50425757 Cleaned, made no difference. See https://issues.apache.org/jira/browse/SPARK-1138 where others had this issue. --- If your project is set up for it, you can reply to

[GitHub] spark pull request: SPARK-1416: PySpark support for SequenceFile a...

2014-07-28 Thread rjurney
Github user rjurney commented on the pull request: https://github.com/apache/spark/pull/455#issuecomment-50425084 Hmmm, I didn't clean before rebuilding with CDH 4.4. Trying that now. --- If your project is set up for it, you can reply to this email and have your reply appe

[GitHub] spark pull request: SPARK-1416: PySpark support for SequenceFile a...

2014-07-28 Thread rjurney
Github user rjurney commented on the pull request: https://github.com/apache/spark/pull/455#issuecomment-50424644 Ok, so now I rebuilt with my specific CDH version, and I get this when I run ./sbin/start-master.sh: Spark Command: /usr/java/jdk1.8.0//bin/java -cp ::/home

[GitHub] spark pull request: SPARK-1416: PySpark support for SequenceFile a...

2014-07-28 Thread rjurney
Github user rjurney commented on the pull request: https://github.com/apache/spark/pull/455#issuecomment-50422374 @srowen Actually yes, I'm that stupid :) Figured it out on me own though, have it building across the cluster now. --- If your project is set up for it, you can rep

[GitHub] spark pull request: SPARK-1416: PySpark support for SequenceFile a...

2014-07-28 Thread rjurney
Github user rjurney commented on the pull request: https://github.com/apache/spark/pull/455#issuecomment-50422135 @JoshRosen Huh, I'm going by http://spark.apache.org/docs/latest/hadoop-third-party-distributions.html and I get: sbt.ResolveExce

[GitHub] spark pull request: SPARK-1416: PySpark support for SequenceFile a...

2014-07-28 Thread rjurney
Github user rjurney commented on the pull request: https://github.com/apache/spark/pull/455#issuecomment-50422022 @JoshRosen Actually, I just did: sbt/sbt assembly publish-local Trying again with: SPARK_HADOOP_VERSION=2.0.0-mr1-cdh4.X.X sbt/sbt assembly publish-local --- If

[GitHub] spark pull request: SPARK-1416: PySpark support for SequenceFile a...

2014-07-28 Thread rjurney
Github user rjurney commented on the pull request: https://github.com/apache/spark/pull/455#issuecomment-50420550 @ericgarcia I am running CDH 4.4 with mapreduce 1, and when I run this on a cluster running Spark master in standalone mode: avroRdd = sc.newAPIHadoopFile

[GitHub] spark pull request: SPARK-1416: PySpark support for SequenceFile a...

2014-07-28 Thread rjurney
Github user rjurney commented on the pull request: https://github.com/apache/spark/pull/455#issuecomment-50419281 Not sure if its related, but when I try to start workers having built from trunk, I get this: [hivedata@hivecluster2 spark]$ ./bin/spark-class

[GitHub] spark pull request: SPARK-1416: PySpark support for SequenceFile a...

2014-07-24 Thread rjurney
Github user rjurney commented on the pull request: https://github.com/apache/spark/pull/455#issuecomment-50101315 I got this to run and I'm able to get work done! Does this code have to be run on the latest Spark code? Would it run on 1.0? On Tuesday, July 22,

[GitHub] spark pull request: SPARK-1416: PySpark support for SequenceFile a...

2014-07-24 Thread rjurney
Github user rjurney commented on the pull request: https://github.com/apache/spark/pull/455#issuecomment-50084602 When I load my records, I get: Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.newAPIHadoopFile

[GitHub] spark pull request: SPARK-1416: PySpark support for SequenceFile a...

2014-07-21 Thread rjurney
Github user rjurney commented on the pull request: https://github.com/apache/spark/pull/455#issuecomment-49700443 @ericgarcia Thanks! Very exciting. An example file is here: https://drive.google.com/file/d/0B3wy0wXNwbpRekJVaW13cGRKb1U/edit?usp=sharing --- If your project is set up

[GitHub] spark pull request: SPARK-1416: PySpark support for SequenceFile a...

2014-07-21 Thread rjurney
Github user rjurney commented on the pull request: https://github.com/apache/spark/pull/455#issuecomment-49698499 This file, without any UNIONS, works: https://github.com/miguno/avro-cli-examples/blob/master/twitter.snappy.avro My data is more complex :( --- If your project

[GitHub] spark pull request: SPARK-1416: PySpark support for SequenceFile a...

2014-07-21 Thread rjurney
Github user rjurney commented on the pull request: https://github.com/apache/spark/pull/455#issuecomment-49698081 It also looks like we need a custom function to handle the UNION type. I've extended what you wrote for DOUBLE/FLOAT: def unpack(value: Any, schema: S

[GitHub] spark pull request: SPARK-1416: PySpark support for SequenceFile a...

2014-07-21 Thread rjurney
Github user rjurney commented on the pull request: https://github.com/apache/spark/pull/455#issuecomment-49697592 My data has doubles in it, could that be the issue? Using Python version 2.7.6rc1 (v2.7.6rc1:4913d0e9be30+, Oct 27 2013 20:52:11) SparkContext available as sc

[GitHub] spark pull request: SPARK-1416: PySpark support for SequenceFile a...

2014-07-21 Thread rjurney
Github user rjurney commented on the pull request: https://github.com/apache/spark/pull/455#issuecomment-49684527 @ericgarcia This is awesome. How can I test this code out? I can handle patching trunk, but how do I call the converter? --- If your project is set up for it, you can

[GitHub] spark pull request: SPARK-1416: PySpark support for SequenceFile a...

2014-06-22 Thread rjurney
Github user rjurney commented on the pull request: https://github.com/apache/spark/pull/455#issuecomment-46789138 Thanks, master doesn't build for me. Is there a particular commit you recommend using? [error] [error] last tree to typer: Literal(Con

[GitHub] spark pull request: SPARK-1416: PySpark support for SequenceFile a...

2014-06-21 Thread rjurney
Github user rjurney commented on the pull request: https://github.com/apache/spark/pull/455#issuecomment-46767642 Thanks a ton! One thing - how can I pull spark core 1.1 from maven? [ERROR] Failed to execute goal on project avro: Could not resolve dependencies for project

[GitHub] spark pull request: SPARK-1416: PySpark support for SequenceFile a...

2014-06-20 Thread rjurney
Github user rjurney commented on the pull request: https://github.com/apache/spark/pull/455#issuecomment-46731898 That sounds awesome, but can you put this into context a little bit, in terms of where I would put that code and how I would run it? --- If your project is set up for it

[GitHub] spark pull request: SPARK-1416: PySpark support for SequenceFile a...

2014-06-19 Thread rjurney
Github user rjurney commented on the pull request: https://github.com/apache/spark/pull/455#issuecomment-46638552 It would be really helpful if this enabled loading of Avro data via GenericRecord. --- If your project is set up for it, you can reply to this email and have your reply