SchemaRDD has a method insertInto(table). When the table is partitioned, it would be more sensible and convenient to extend it with a list of partition key and values.
From: Denny Lee <denny.g....@gmail.com<mailto:denny.g....@gmail.com>> Date: Thursday, September 11, 2014 at 6:39 PM To: Du Li <l...@yahoo-inc.com<mailto:l...@yahoo-inc.com>> Cc: "u...@spark.incubator.apache.org<mailto:u...@spark.incubator.apache.org>" <u...@spark.incubator.apache.org<mailto:u...@spark.incubator.apache.org>>, alexandria1101 <alexandria.shea...@gmail.com<mailto:alexandria.shea...@gmail.com>> Subject: Re: Table not found: using jdbc console to query sparksql hive thriftserver It sort of depends on the definition of efficiently. From a work flow perspective I would agree but from an I/O perspective, wouldn’t there be the same multi-pass from the standpoint of the Hive context needing to push the data into HDFS? Saying this, if you’re pushing the data into HDFS and then creating Hive tables via load (vs. a reference point ala external tables), I would agree with you. And thanks for correcting me, the registerTempTable is in the SqlContext. On September 10, 2014 at 13:47:24, Du Li (l...@yahoo-inc.com<mailto:l...@yahoo-inc.com>) wrote: Hi Denny, There is a related question by the way. I have a program that reads in a stream of RDD¹s, each of which is to be loaded into a hive table as one partition. Currently I do this by first writing the RDD¹s to HDFS and then loading them to hive, which requires multiple passes of HDFS I/O and serialization/deserialization. I wonder if it is possible to do it more efficiently with Spark 1.1 streaming + SQL, e.g., by registering the RDDs into a hive context so that the data is loaded directly into the hive table in cache and meanwhile visible to jdbc/odbc clients. In the spark source code, the method registerTempTable you mentioned works on SqlContext instead of HiveContext. Thanks, Du On 9/10/14, 1:21 PM, "Denny Lee" <denny.g....@gmail.com<mailto:denny.g....@gmail.com>> wrote: >Actually, when registering the table, it is only available within the sc >context you are running it in. For Spark 1.1, the method name is changed >to RegisterAsTempTable to better reflect that. > >The Thrift server process runs under a different process meaning that it >cannot see any of the tables generated within the sc context. You would >need to save the sc table into Hive and then the Thrift process would be >able to see them. > >HTH! > >> On Sep 10, 2014, at 13:08, alexandria1101 >><alexandria.shea...@gmail.com<mailto:alexandria.shea...@gmail.com>> wrote: >> >> I used the hiveContext to register the tables and the tables are still >>not >> being found by the thrift server. Do I have to pass the hiveContext to >>JDBC >> somehow? >> >> >> >> -- >> View this message in context: >>http://apache-spark-user-list.1001560.n3.nabble.com/Table-not-found-using >>-jdbc-console-to-query-sparksql-hive-thriftserver-tp13840p13922.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: >> user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org> >> For additional commands, e-mail: >> user-h...@spark.apache.org<mailto:user-h...@spark.apache.org> >> > >--------------------------------------------------------------------- >To unsubscribe, e-mail: >user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org> >For additional commands, e-mail: >user-h...@spark.apache.org<mailto:user-h...@spark.apache.org> >