Which version are you using? Also |.saveAsTable()| saves the table to
Hive metastore, so you need to make sure your Spark application points
to the same Hive metastore instance as the JDBC Thrift server. For
example, put |hive-site.xml| under |$SPARK_HOME/conf|, and run
|spark-shell| and |start-thriftserver.sh| under the same |$SPARK_HOME|
should work. Just verified this against Spark 1.1.
On 10/10/14 9:32 AM, Steve Arnold wrote:
I am writing a Spark job to persist data using HiveContext so that it
can be accessed via the JDBC Thrift server. Although my code doesn't
throw an error, I am unable to see my persisted data when I query from
the Thrift server.
I tried three different ways to get this to work:
1)
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
val schemaRdd: SchemaRDD = sqlContext.createSchemaRDD(rdd)
schemaRdd.saveAsTable("test_table")
rdd -> RDD of a case class
sc -> Spark Context
Case class used in all my examples:
case class SomeClass(key:String, value:String) extends Serializable
2) I then created a table called test_table after logging in to the
thrift server and added two dummy records in it.
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
import sqlContext._
val usermeta = hql(" SELECT key, value from test_table")
val rdd = usermeta.map(t=>{SomeClass("3","idddddddd")})
val schemaRdd = createSchemaRDD(rdd)
schemaRdd.insertInto("test_table")
3) Tried the documented link on the Spark Sql programming page
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
sqlContext.sql("CREATE TABLE IF NOT EXISTS test_table (key
String, value String)")
The Spark job does other computations which it is able to complete and
returns the correct results; just the Sql part doesn't work. What am I
doing wrong? I thought that the HiveContext could be accessed from
running command line queries in the Thrift Server.
Cheers,
Steve