Hi,
I saw the posting about storing NumPy values in sequence files:
http://mail-archives.us.apache.org/mod_mbox/spark-user/201506.mbox/%3cCAJQK-mg1PUCc_hkV=q3n-01ioq_pkwe1g-c39ximco3khqn...@mail.gmail.com%3e
I’ve had a go at implementing this, and issued a PR request at
and
then save it through your own checkpoint mechanism.
If not, please share your use case.
On 11 May 2015 00:38, Peter Aberline peter.aberl...@gmail.com wrote:
Hi
I have many thousands of small DataFrames that I would like to save to
the one Parquet file to avoid the HDFS 'small files' problem
Hi,
I'm having problems with a ClassNotFoundException using this simple example:
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import java.net.URLClassLoader
import scala.util.Marshal
class ClassToRoundTrip(val id: Int) extends