Re: Querying registered RDD (AsTable) using JDBC

Cheng Lian Sun, 21 Dec 2014 03:52:26 -0800

Evert - Thanks for the instructions, this is generally useful in otherscenarios, but I think this isn’t what Shahab needs, because|saveAsTable| actually saves the contents of the SchemaRDD into Hive.

Shahab - As Michael has answered in another thread, you may try|HiveThriftServer2.startWithContext|, which is a quite experimentalfeature. Here is a quick |spark-shell| sample session:


|import  org.apache.spark.sql.hive.HiveContext
import  org.apache.spark.sql.catalyst.types._
import  java.sql.Date

val  sparkContext  =  sc
import  sparkContext._

val  sqlContext  =  new  HiveContext(sparkContext)
import  sqlContext._

makeRDD((1,"hello") :: (2,"world") 
::Nil).toSchemaRDD.cache().registerTempTable("t")

import  org.apache.spark.sql.hive.thriftserver._
HiveThriftServer2.startWithContext(sqlContext)
|

Then you can connect to the started server via beeline:

|$ ./bin/beeline -u jdbc:hive2://localhost:10000/default
0: jdbc:hive2://localhost:10000/default> select * from t;
+-----+--------+
| _1  |   _2   |
+-----+--------+
| 1   | hello  |
| 2   | world  |
+-----+--------+
2 rows selected (0.208 seconds)
|

Cheng

On 12/20/14 1:09 AM, Evert Lammerts wrote:

Yes you can, using HiveContext, a metastore and the thriftserver. Themetastore persists information about your SchemaRDD, and theHiveContext, initialised with information on the metastore, caninteract with the metastore. The thriftserver provides JDBCconnections using the metastore.
Using MySQL as an example backend for the metastore:

1. Install MySQL
2. Create a database: CREATE database hive_metastore CHARSET latin1;
3. Create a metastore user: GRANT ALL ON hive_metastore.* TOmetastore_user IDENTIFIED BY 'password';4. Create a hive-site.xml in your Spark's conf dir: seehttp://pastebin.com/VXcmJWdX for an example5. Download the mysql jdbc driver fromhttp://dev.mysql.com/downloads/connector/j/6. Start the spark-shell with the mysql driver on the classpath: $./bin/spark-shell --driver-class-path mysql-connector-java-5.1.34-bin.jar
7. Register the table using something like:
> val sqlct = new org.apache.spark.sql.hive.HiveContext(sc)
> sqlct.setConf("hive.metastore.warehouse.dir”,"/some/path/to/store/tables") # if you're local. i.e. not using HDFS
> ... # create your schemardd using sqlct
> rdd.saveAsTable("mytable")
8. Start the thriftserver (which provides the JDBCconnection): 0.9710645253623995nbsp;./sbin/start-thriftserver.sh--driver-class-path mysql-connector-java-5.1.34-bin.jar --confhive.metastore.warehouse.dir=/some/path/to/store/tables
Something like that should do it. Now you can connect from for examplebeeline:
$ ./bin/beeline
> !connect jdbc:hive2://localhost:10000
> show tables;
This is a good guide re the metastore regardless of your distribution:http://www.cloudera.com/content/cloudera/en/documentation/cdh4/v4-2-0/CDH4-Installation-Guide/cdh4ig_topic_18_4.html.
On Fri Dec 19 2014 at 5:34:49 PM shahab <shahab.mok...@gmail.com<mailto:shahab.mok...@gmail.com>> wrote:
    Hi,

    Sorry for repeating the same question, just wanted to clarify the
    issue :

    Is it possible to expose a RDD (or SchemaRDD) to external
    components (outside spark) so it can  be queried over JDBC (my
    goal is not to place the RDD back in a database  but use this
    cached RDD to server JDBC queries) ?

    best,

    /shahab

Re: Querying registered RDD (AsTable) using JDBC

Reply via email to