Hi Gray, Like Michael mentioned, you need to take care of the scala case classes or java beans, because SparkSQL need the schema.
Currently we are trying insert our data to HBase with Scala 2.10.4 and Spark 1.0. All the data are tables. We created one case class for each rows, which means the parameter number of case class should as the same as the column number. But Scala 2.10.4 has a limitation that is the max parameter number for case class is 22. So here the problem occurs. If the table is small, and the column number less than 22, everything will be fine. But if we got a larger table with more than 22 columns, then error will be reported. We know Scala 2.11 has remove the limitation of parameter number, but Spark 1.0 is not compatible with it. So now we are considering use java beans instead of Scala case classes. Best, Haoming From: mich...@databricks.com Date: Mon, 7 Jul 2014 17:12:42 -0700 Subject: Re: SparkSQL with sequence file RDDs To: user@spark.apache.org I haven't heard any reports of this yet, but I don't see any reason why it wouldn't work. You'll need to manually convert the objects that come out of the sequence file into something where SparkSQL can detect the schema (i.e. scala case classes or java beans) before you can register the RDD as a table. If you run into any issues please let me know. On Mon, Jul 7, 2014 at 12:36 PM, Gary Malouf <malouf.g...@gmail.com> wrote: Has anyone reported issues using SparkSQL with sequence files (all of our data is in this format within HDFS)? We are considering whether to burn the time upgrading to Spark 1.0 from 0.9 now and this is a main decision point for us.