RE: SparkSQL with sequence file RDDs

Haoming Zhang Mon, 07 Jul 2014 17:47:33 -0700

Hi Gray,

Like Michael mentioned, you need to take care of the scala case classes or java 
beans, because SparkSQL need the schema.


Currently we are trying insert our data to HBase with Scala 2.10.4 and Spark 
1.0. 

All the data are tables. We created one case class for each rows, which means 
the parameter number of case class should as the same as the column number. But 
Scala 2.10.4 has a limitation that is the max parameter number for case class 
is 22. So here the problem occurs. If the table is small, and the column number 
less than 22, everything will be fine. But if we got a larger table with more 
than 22 columns, then error will be reported.

We know Scala 2.11 has remove the limitation of parameter number, but Spark 1.0 
is not compatible with it. So now we are considering use java beans instead of 
Scala case classes.

Best,
Haoming



From: mich...@databricks.com
Date: Mon, 7 Jul 2014 17:12:42 -0700
Subject: Re: SparkSQL with sequence file RDDs
To: user@spark.apache.org

I haven't heard any reports of this yet, but I don't see any reason why it 
wouldn't work. You'll need to manually convert the objects that come out of the 
sequence file into something where SparkSQL can detect the schema (i.e. scala 
case classes or java beans) before you can register the RDD as a table.


If you run into any issues please let me know.

On Mon, Jul 7, 2014 at 12:36 PM, Gary Malouf <malouf.g...@gmail.com> wrote:


Has anyone reported issues using SparkSQL with sequence files (all of our data 
is in this format within HDFS)?  We are considering whether to burn the time 
upgrading to Spark 1.0 from 0.9 now and this is a main decision point for us.

RE: SparkSQL with sequence file RDDs

Reply via email to