Evert - Thanks for the instructions, this is generally useful in other
scenarios, but I think this isn’t what Shahab needs, because
|saveAsTable| actually saves the contents of the SchemaRDD into Hive.
Shahab - As Michael has answered in another thread, you may try
|HiveThriftServer2.startWithContext|, which is a quite experimental
feature. Here is a quick |spark-shell| sample session:
|import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.catalyst.types._
import java.sql.Date
val sparkContext = sc
import sparkContext._
val sqlContext = new HiveContext(sparkContext)
import sqlContext._
makeRDD((1,"hello") :: (2,"world")
::Nil).toSchemaRDD.cache().registerTempTable("t")
import org.apache.spark.sql.hive.thriftserver._
HiveThriftServer2.startWithContext(sqlContext)
|
Then you can connect to the started server via beeline:
|$ ./bin/beeline -u jdbc:hive2://localhost:10000/default
0: jdbc:hive2://localhost:10000/default> select * from t;
+-----+--------+
| _1 | _2 |
+-----+--------+
| 1 | hello |
| 2 | world |
+-----+--------+
2 rows selected (0.208 seconds)
|
Cheng
On 12/20/14 1:09 AM, Evert Lammerts wrote:
Yes you can, using HiveContext, a metastore and the thriftserver. The
metastore persists information about your SchemaRDD, and the
HiveContext, initialised with information on the metastore, can
interact with the metastore. The thriftserver provides JDBC
connections using the metastore.
Using MySQL as an example backend for the metastore:
1. Install MySQL
2. Create a database: CREATE database hive_metastore CHARSET latin1;
3. Create a metastore user: GRANT ALL ON hive_metastore.* TO
metastore_user IDENTIFIED BY 'password';
4. Create a hive-site.xml in your Spark's conf dir: see
http://pastebin.com/VXcmJWdX for an example
5. Download the mysql jdbc driver from
http://dev.mysql.com/downloads/connector/j/
6. Start the spark-shell with the mysql driver on the classpath: $
./bin/spark-shell --driver-class-path mysql-connector-java-5.1.34-bin.jar
7. Register the table using something like:
> val sqlct = new org.apache.spark.sql.hive.HiveContext(sc)
> sqlct.setConf("hive.metastore.warehouse.dir”,
"/some/path/to/store/tables") # if you're local. i.e. not using HDFS
> ... # create your schemardd using sqlct
> rdd.saveAsTable("mytable")
8. Start the thriftserver (which provides the JDBC
connection): 0.9710645253623995nbsp;./sbin/start-thriftserver.sh
--driver-class-path mysql-connector-java-5.1.34-bin.jar --conf
hive.metastore.warehouse.dir=/some/path/to/store/tables
Something like that should do it. Now you can connect from for example
beeline:
$ ./bin/beeline
> !connect jdbc:hive2://localhost:10000
> show tables;
This is a good guide re the metastore regardless of your distribution:
http://www.cloudera.com/content/cloudera/en/documentation/cdh4/v4-2-0/CDH4-Installation-Guide/cdh4ig_topic_18_4.html.
On Fri Dec 19 2014 at 5:34:49 PM shahab <shahab.mok...@gmail.com
<mailto:shahab.mok...@gmail.com>> wrote:
Hi,
Sorry for repeating the same question, just wanted to clarify the
issue :
Is it possible to expose a RDD (or SchemaRDD) to external
components (outside spark) so it can be queried over JDBC (my
goal is not to place the RDD back in a database but use this
cached RDD to server JDBC queries) ?
best,
/shahab