RE: Spark-SQL JDBC driver

Anas Mosaad Thu, 11 Dec 2014 04:09:24 -0800

Actually I came to a conclusion that RDDs has to be persisted in hive in
order to be able to access through thrift.
Hope I didn't end up with incorrect conclusion.
Please someone correct me if I am wrong.
On Dec 11, 2014 8:53 AM, "Judy Nash" <judyn...@exchange.microsoft.com>
wrote:


>  Looks like you are wondering why you cannot see the RDD table you have
> created via thrift?
>
>
>
> Based on my own experience with spark 1.1, RDD created directly via Spark
> SQL (i.e. Spark Shell or Spark-SQL.sh) is not visible on thrift, since
> thrift has its own session containing its own RDD.
>
> Spark SQL experts on the forum can confirm on this though.
>
>
>
> *From:* Cheng Lian [mailto:lian.cs....@gmail.com]
> *Sent:* Tuesday, December 9, 2014 6:42 AM
> *To:* Anas Mosaad
> *Cc:* Judy Nash; user@spark.apache.org
> *Subject:* Re: Spark-SQL JDBC driver
>
>
>
> According to the stacktrace, you were still using SQLContext rather than
> HiveContext. To interact with Hive, HiveContext *must* be used.
>
> Please refer to this page
> http://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables
>
>  On 12/9/14 6:26 PM, Anas Mosaad wrote:
>
>  Back to the first question, this will mandate that hive is up and
> running?
>
>
>
> When I try it, I get the following exception. The documentation says that
> this method works only on SchemaRDD. I though that countries.saveAsTable
> did not work for that a reason so I created a tmp that contains the results
> from the registered temp table. Which I could validate that it's a
> SchemaRDD as shown below.
>
>
>
>
> * @Judy,* I do really appreciate your kind support and I want to
> understand and off course don't want to wast your time. If you can direct
> me the documentation describing this details, this will be great.
>
>
>
> scala> val tmp = sqlContext.sql("select * from countries")
>
> tmp: org.apache.spark.sql.SchemaRDD =
>
> SchemaRDD[12] at RDD at SchemaRDD.scala:108
>
> == Query Plan ==
>
> == Physical Plan ==
>
> PhysicalRDD
> [COUNTRY_ID#20,COUNTRY_ISO_CODE#21,COUNTRY_NAME#22,COUNTRY_SUBREGION#23,COUNTRY_SUBREGION_ID#24,COUNTRY_REGION#25,COUNTRY_REGION_ID#26,COUNTRY_TOTAL#27,COUNTRY_TOTAL_ID#28,COUNTRY_NAME_HIST#29],
> MapPartitionsRDD[9] at mapPartitions at ExistingRDD.scala:36
>
>
>
> scala> tmp.saveAsTable("Countries")
>
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved
> plan found, tree:
>
> 'CreateTableAsSelect None, Countries, false, None
>
>  Project
> [COUNTRY_ID#20,COUNTRY_ISO_CODE#21,COUNTRY_NAME#22,COUNTRY_SUBREGION#23,COUNTRY_SUBREGION_ID#24,COUNTRY_REGION#25,COUNTRY_REGION_ID#26,COUNTRY_TOTAL#27,COUNTRY_TOTAL_ID#28,COUNTRY_NAME_HIST#29]
>
>   Subquery countries
>
>    LogicalRDD
> [COUNTRY_ID#20,COUNTRY_ISO_CODE#21,COUNTRY_NAME#22,COUNTRY_SUBREGION#23,COUNTRY_SUBREGION_ID#24,COUNTRY_REGION#25,COUNTRY_REGION_ID#26,COUNTRY_TOTAL#27,COUNTRY_TOTAL_ID#28,COUNTRY_NAME_HIST#29],
> MapPartitionsRDD[9] at mapPartitions at ExistingRDD.scala:36
>
>
>
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:83)
>
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:78)
>
> at
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144)
>
> at
> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:135)
>
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:78)
>
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:76)
>
> at
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)
>
> at
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)
>
> at
> scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)
>
> at
> scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)
>
> at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34)
>
> at
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59)
>
> at
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51)
>
> at scala.collection.immutable.List.foreach(List.scala:318)
>
> at
> org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:411)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:411)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.withCachedData$lzycompute(SQLContext.scala:412)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.withCachedData(SQLContext.scala:412)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan$lzycompute(SQLContext.scala:413)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan(SQLContext.scala:413)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:418)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:416)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:422)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:422)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:425)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:425)
>
> at
> org.apache.spark.sql.SchemaRDDLike$class.saveAsTable(SchemaRDDLike.scala:126)
>
> at org.apache.spark.sql.SchemaRDD.saveAsTable(SchemaRDD.scala:108)
>
> at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:22)
>
> at $iwC$$iwC$$iwC$$iwC.<init>(<console>:27)
>
> at $iwC$$iwC$$iwC.<init>(<console>:29)
>
> at $iwC$$iwC.<init>(<console>:31)
>
> at $iwC.<init>(<console>:33)
>
> at <init>(<console>:35)
>
> at .<init>(<console>:39)
>
> at .<clinit>(<console>)
>
> at .<init>(<console>:7)
>
> at .<clinit>(<console>)
>
> at $print(<console>)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
> at java.lang.reflect.Method.invoke(Method.java:606)
>
> at
> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:852)
>
> at
> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1125)
>
> at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:674)
>
> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:705)
>
> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:669)
>
> at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:828)
>
> at
> org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:873)
>
> at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:785)
>
> at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:628)
>
> at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:636)
>
> at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:641)
>
> at
> org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:968)
>
> at
> org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:916)
>
> at
> org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:916)
>
> at
> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
>
> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:916)
>
> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1011)
>
> at org.apache.spark.repl.Main$.main(Main.scala:31)
>
> at org.apache.spark.repl.Main.main(Main.scala)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
> at java.lang.reflect.Method.invoke(Method.java:606)
>
> at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:365)
>
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
>
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
>
>
>
>
>
>
> On Tue, Dec 9, 2014 at 11:44 AM, Cheng Lian <lian.cs....@gmail.com> wrote:
>
>  How did you register the table under spark-shell? Two things to notice:
>
> 1. To interact with Hive, HiveContext instead of SQLContext must be used.
> 2. `registerTempTable` doesn't persist the table into Hive metastore, and
> the table is lost after quitting spark-shell. Instead, you must use
> `saveAsTable`.
>
>
>
> On 12/9/14 5:27 PM, Anas Mosaad wrote:
>
>  Thanks Cheng,
>
>
>
> I thought spark-sql is using the same exact metastore, right? However, it
> didn't work as expected. Here's what I did.
>
>
>
> In spark-shell, I loaded a csv files and registered the table, say
> countries.
>
> Started the thrift server.
>
> Connected using beeline. When I run show tables or !tables, I get empty
> list of tables as follow:
>
>  *0: jdbc:hive2://localhost:10000> !tables*
>
> *+------------+--------------+-------------+-------------+----------+*
>
> *| TABLE_CAT  | TABLE_SCHEM  | TABLE_NAME  | TABLE_TYPE  | REMARKS  |*
>
> *+------------+--------------+-------------+-------------+----------+*
>
> *+------------+--------------+-------------+-------------+----------+*
>
> *0: jdbc:hive2://localhost:10000> show tables ;*
>
> *+---------+*
>
> *| result  |*
>
> *+---------+*
>
> *+---------+*
>
> *No rows selected (0.106 seconds)*
>
> *0: jdbc:hive2://localhost:10000> *
>
>
>
>
>
> Kindly advice, what am I missing? I want to read the RDD using SQL from
> outside spark-shell (i.e. like any other relational database)
>
>
>
>
>
> On Tue, Dec 9, 2014 at 11:05 AM, Cheng Lian <lian.cs....@gmail.com> wrote:
>
>  Essentially, the Spark SQL JDBC Thrift server is just a Spark port of
> HiveServer2. You don't need to run Hive, but you do need a working
> Metastore.
>
>
>
> On 12/9/14 3:59 PM, Anas Mosaad wrote:
>
>  Thanks Judy, this is exactly what I'm looking for. However, and plz
> forgive me if it's a dump question is: It seems to me that thrift is the
> same as hive2 JDBC driver, does this mean that starting thrift will start
> hive as well on the server?
>
>
>
> On Mon, Dec 8, 2014 at 9:11 PM, Judy Nash <judyn...@exchange.microsoft.com>
> wrote:
>
>  You can use thrift server for this purpose then test it with beeline.
>
>
>
> See doc:
>
>
> https://spark.apache.org/docs/latest/sql-programming-guide.html#running-the-thrift-jdbc-server
>
>
>
>
>
> *From:* Anas Mosaad [mailto:anas.mos...@incorta.com]
> *Sent:* Monday, December 8, 2014 11:01 AM
> *To:* user@spark.apache.org
> *Subject:* Spark-SQL JDBC driver
>
>
>
> Hello Everyone,
>
>
>
> I'm brand new to spark and was wondering if there's a JDBC driver to
> access spark-SQL directly. I'm running spark in standalone mode and don't
> have hadoop in this environment.
>
>
>
> --
>
>
>
> *Best Regards/أطيب المنى,*
>
>
>
> *Anas Mosaad*
>
>
>
>
>
>
>
> --
>
>
>
> *Best Regards/أطيب المنى,*
>
>
>
> *Anas Mosaad*
>
> *Incorta Inc.*
>
> *+20-100-743-4510*
>
>
>
>
>
>
>
> --
>
>
>
> *Best Regards/أطيب المنى,*
>
>
>
> *Anas Mosaad*
>
> *Incorta Inc.*
>
> *+20-100-743-4510*
>
>
>
>
>
>
>
> --
>
>
>
> *Best Regards/أطيب المنى,*
>
>
>
> *Anas Mosaad*
>
> *Incorta Inc.*
>
> *+20-100-743-4510*
>
>
>

RE: Spark-SQL JDBC driver

Reply via email to