Actually I came to a conclusion that RDDs has to be persisted in hive in order to be able to access through thrift. Hope I didn't end up with incorrect conclusion. Please someone correct me if I am wrong. On Dec 11, 2014 8:53 AM, "Judy Nash" <judyn...@exchange.microsoft.com> wrote:
> Looks like you are wondering why you cannot see the RDD table you have > created via thrift? > > > > Based on my own experience with spark 1.1, RDD created directly via Spark > SQL (i.e. Spark Shell or Spark-SQL.sh) is not visible on thrift, since > thrift has its own session containing its own RDD. > > Spark SQL experts on the forum can confirm on this though. > > > > *From:* Cheng Lian [mailto:lian.cs....@gmail.com] > *Sent:* Tuesday, December 9, 2014 6:42 AM > *To:* Anas Mosaad > *Cc:* Judy Nash; user@spark.apache.org > *Subject:* Re: Spark-SQL JDBC driver > > > > According to the stacktrace, you were still using SQLContext rather than > HiveContext. To interact with Hive, HiveContext *must* be used. > > Please refer to this page > http://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables > > On 12/9/14 6:26 PM, Anas Mosaad wrote: > > Back to the first question, this will mandate that hive is up and > running? > > > > When I try it, I get the following exception. The documentation says that > this method works only on SchemaRDD. I though that countries.saveAsTable > did not work for that a reason so I created a tmp that contains the results > from the registered temp table. Which I could validate that it's a > SchemaRDD as shown below. > > > > > * @Judy,* I do really appreciate your kind support and I want to > understand and off course don't want to wast your time. If you can direct > me the documentation describing this details, this will be great. > > > > scala> val tmp = sqlContext.sql("select * from countries") > > tmp: org.apache.spark.sql.SchemaRDD = > > SchemaRDD[12] at RDD at SchemaRDD.scala:108 > > == Query Plan == > > == Physical Plan == > > PhysicalRDD > [COUNTRY_ID#20,COUNTRY_ISO_CODE#21,COUNTRY_NAME#22,COUNTRY_SUBREGION#23,COUNTRY_SUBREGION_ID#24,COUNTRY_REGION#25,COUNTRY_REGION_ID#26,COUNTRY_TOTAL#27,COUNTRY_TOTAL_ID#28,COUNTRY_NAME_HIST#29], > MapPartitionsRDD[9] at mapPartitions at ExistingRDD.scala:36 > > > > scala> tmp.saveAsTable("Countries") > > org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved > plan found, tree: > > 'CreateTableAsSelect None, Countries, false, None > > Project > [COUNTRY_ID#20,COUNTRY_ISO_CODE#21,COUNTRY_NAME#22,COUNTRY_SUBREGION#23,COUNTRY_SUBREGION_ID#24,COUNTRY_REGION#25,COUNTRY_REGION_ID#26,COUNTRY_TOTAL#27,COUNTRY_TOTAL_ID#28,COUNTRY_NAME_HIST#29] > > Subquery countries > > LogicalRDD > [COUNTRY_ID#20,COUNTRY_ISO_CODE#21,COUNTRY_NAME#22,COUNTRY_SUBREGION#23,COUNTRY_SUBREGION_ID#24,COUNTRY_REGION#25,COUNTRY_REGION_ID#26,COUNTRY_TOTAL#27,COUNTRY_TOTAL_ID#28,COUNTRY_NAME_HIST#29], > MapPartitionsRDD[9] at mapPartitions at ExistingRDD.scala:36 > > > > at > org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:83) > > at > org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:78) > > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144) > > at > org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:135) > > at > org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:78) > > at > org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:76) > > at > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61) > > at > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59) > > at > scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51) > > at > scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60) > > at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34) > > at > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59) > > at > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51) > > at scala.collection.immutable.List.foreach(List.scala:318) > > at > org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51) > > at > org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:411) > > at > org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:411) > > at > org.apache.spark.sql.SQLContext$QueryExecution.withCachedData$lzycompute(SQLContext.scala:412) > > at > org.apache.spark.sql.SQLContext$QueryExecution.withCachedData(SQLContext.scala:412) > > at > org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan$lzycompute(SQLContext.scala:413) > > at > org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan(SQLContext.scala:413) > > at > org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:418) > > at > org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:416) > > at > org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:422) > > at > org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:422) > > at > org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:425) > > at > org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:425) > > at > org.apache.spark.sql.SchemaRDDLike$class.saveAsTable(SchemaRDDLike.scala:126) > > at org.apache.spark.sql.SchemaRDD.saveAsTable(SchemaRDD.scala:108) > > at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:22) > > at $iwC$$iwC$$iwC$$iwC.<init>(<console>:27) > > at $iwC$$iwC$$iwC.<init>(<console>:29) > > at $iwC$$iwC.<init>(<console>:31) > > at $iwC.<init>(<console>:33) > > at <init>(<console>:35) > > at .<init>(<console>:39) > > at .<clinit>(<console>) > > at .<init>(<console>:7) > > at .<clinit>(<console>) > > at $print(<console>) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:606) > > at > org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:852) > > at > org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1125) > > at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:674) > > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:705) > > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:669) > > at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:828) > > at > org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:873) > > at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:785) > > at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:628) > > at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:636) > > at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:641) > > at > org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:968) > > at > org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:916) > > at > org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:916) > > at > scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) > > at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:916) > > at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1011) > > at org.apache.spark.repl.Main$.main(Main.scala:31) > > at org.apache.spark.repl.Main.main(Main.scala) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:606) > > at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:365) > > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) > > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > > > > > > > > On Tue, Dec 9, 2014 at 11:44 AM, Cheng Lian <lian.cs....@gmail.com> wrote: > > How did you register the table under spark-shell? Two things to notice: > > 1. To interact with Hive, HiveContext instead of SQLContext must be used. > 2. `registerTempTable` doesn't persist the table into Hive metastore, and > the table is lost after quitting spark-shell. Instead, you must use > `saveAsTable`. > > > > On 12/9/14 5:27 PM, Anas Mosaad wrote: > > Thanks Cheng, > > > > I thought spark-sql is using the same exact metastore, right? However, it > didn't work as expected. Here's what I did. > > > > In spark-shell, I loaded a csv files and registered the table, say > countries. > > Started the thrift server. > > Connected using beeline. When I run show tables or !tables, I get empty > list of tables as follow: > > *0: jdbc:hive2://localhost:10000> !tables* > > *+------------+--------------+-------------+-------------+----------+* > > *| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | REMARKS |* > > *+------------+--------------+-------------+-------------+----------+* > > *+------------+--------------+-------------+-------------+----------+* > > *0: jdbc:hive2://localhost:10000> show tables ;* > > *+---------+* > > *| result |* > > *+---------+* > > *+---------+* > > *No rows selected (0.106 seconds)* > > *0: jdbc:hive2://localhost:10000> * > > > > > > Kindly advice, what am I missing? I want to read the RDD using SQL from > outside spark-shell (i.e. like any other relational database) > > > > > > On Tue, Dec 9, 2014 at 11:05 AM, Cheng Lian <lian.cs....@gmail.com> wrote: > > Essentially, the Spark SQL JDBC Thrift server is just a Spark port of > HiveServer2. You don't need to run Hive, but you do need a working > Metastore. > > > > On 12/9/14 3:59 PM, Anas Mosaad wrote: > > Thanks Judy, this is exactly what I'm looking for. However, and plz > forgive me if it's a dump question is: It seems to me that thrift is the > same as hive2 JDBC driver, does this mean that starting thrift will start > hive as well on the server? > > > > On Mon, Dec 8, 2014 at 9:11 PM, Judy Nash <judyn...@exchange.microsoft.com> > wrote: > > You can use thrift server for this purpose then test it with beeline. > > > > See doc: > > > https://spark.apache.org/docs/latest/sql-programming-guide.html#running-the-thrift-jdbc-server > > > > > > *From:* Anas Mosaad [mailto:anas.mos...@incorta.com] > *Sent:* Monday, December 8, 2014 11:01 AM > *To:* user@spark.apache.org > *Subject:* Spark-SQL JDBC driver > > > > Hello Everyone, > > > > I'm brand new to spark and was wondering if there's a JDBC driver to > access spark-SQL directly. I'm running spark in standalone mode and don't > have hadoop in this environment. > > > > -- > > > > *Best Regards/أطيب المنى,* > > > > *Anas Mosaad* > > > > > > > > -- > > > > *Best Regards/أطيب المنى,* > > > > *Anas Mosaad* > > *Incorta Inc.* > > *+20-100-743-4510* > > > > > > > > -- > > > > *Best Regards/أطيب المنى,* > > > > *Anas Mosaad* > > *Incorta Inc.* > > *+20-100-743-4510* > > > > > > > > -- > > > > *Best Regards/أطيب المنى,* > > > > *Anas Mosaad* > > *Incorta Inc.* > > *+20-100-743-4510* > > >