Re: Spark Python with SequenceFile containing numpy deserialized data in str form

2015-08-30 Thread Peter Aberline
Hi, I saw the posting about storing NumPy values in sequence files: http://mail-archives.us.apache.org/mod_mbox/spark-user/201506.mbox/%3cCAJQK-mg1PUCc_hkV=q3n-01ioq_pkwe1g-c39ximco3khqn...@mail.gmail.com%3e I’ve had a go at implementing this, and issued a PR request at

Re: Multiple DataFrames per Parquet file?

2015-05-10 Thread Peter Aberline
and then save it through your own checkpoint mechanism. If not, please share your use case. On 11 May 2015 00:38, Peter Aberline peter.aberl...@gmail.com wrote: Hi I have many thousands of small DataFrames that I would like to save to the one Parquet file to avoid the HDFS 'small files' problem

Spark-submit ClassNotFoundException with JAR!

2014-09-08 Thread Peter Aberline
Hi, I'm having problems with a ClassNotFoundException using this simple example: import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.SparkConf import java.net.URLClassLoader import scala.util.Marshal class ClassToRoundTrip(val id: Int) extends