Which version are you using? Also |.saveAsTable()| saves the table to Hive metastore, so you need to make sure your Spark application points to the same Hive metastore instance as the JDBC Thrift server. For example, put |hive-site.xml| under |$SPARK_HOME/conf|, and run |spark-shell| and |start-thriftserver.sh| under the same |$SPARK_HOME| should work. Just verified this against Spark 1.1.

On 10/10/14 9:32 AM, Steve Arnold wrote:

I am writing a Spark job to persist data using HiveContext so that it can be accessed via the JDBC Thrift server. Although my code doesn't throw an error, I am unable to see my persisted data when I query from the Thrift server.

I tried three different ways to get this to work:

1)
    val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
    val schemaRdd: SchemaRDD = sqlContext.createSchemaRDD(rdd)
    schemaRdd.saveAsTable("test_table")

rdd -> RDD of a case class
sc -> Spark Context
Case class used in all my examples:

case class SomeClass(key:String, value:String) extends Serializable

2) I then created a table called test_table after logging in to the thrift server and added two dummy records in it.

  val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
   import sqlContext._
    val usermeta =  hql("  SELECT key, value from test_table")

    val rdd = usermeta.map(t=>{SomeClass("3","idddddddd")})
    val schemaRdd = createSchemaRDD(rdd)
    schemaRdd.insertInto("test_table")

3) Tried the documented link on the Spark Sql programming page

     val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
sqlContext.sql("CREATE TABLE IF NOT EXISTS test_table (key String, value String)")

The Spark job does other computations which it is able to complete and returns the correct results; just the Sql part doesn't work. What am I doing wrong? I thought that the HiveContext could be accessed from running command line queries in the Thrift Server.

Cheers,
Steve

Reply via email to