Re: Supporting Hive features in Spark SQL Thrift JDBC server

shahab Tue, 03 Mar 2015 10:10:48 -0800

@Yin: sorry for my mistake, you are right it was added in 1.2, not 0.12.0 ,
 my bad!


On Tue, Mar 3, 2015 at 6:47 PM, shahab <shahab.mok...@gmail.com> wrote:

> Thanks Rohit, yes my mistake, it does work with 1.1 ( I am actually
> running it on spark 1.1)
>
> But do you mean that even HiveConext of spark (nit Calliope
> CassandraAwareHiveContext) is not supporting Hive 0.12 ??
>
> best,
> /Shahab
>
> On Tue, Mar 3, 2015 at 5:55 PM, Rohit Rai <ro...@tuplejump.com> wrote:
>
>> The Hive dependency comes from spark-hive.
>>
>> It does work with Spark 1.1 we will have the 1.2 release later this month.
>> On Mar 3, 2015 8:49 AM, "shahab" <shahab.mok...@gmail.com> wrote:
>>
>>>
>>> Thanks Rohit,
>>>
>>> I am already using Calliope and quite happy with it, well done ! except
>>> the fact that :
>>> 1- It seems that it does not support Hive 0.12 or higher, Am i right?
>>>  for example you can not use : current_time() UDF, or those new UDFs added
>>> in hive 0.12 . Are they supported? Any plan for supporting them?
>>> 2-It does not support Spark 1.1 and 1.2. Any plan for new release?
>>>
>>> best,
>>> /Shahab
>>>
>>> On Tue, Mar 3, 2015 at 5:41 PM, Rohit Rai <ro...@tuplejump.com> wrote:
>>>
>>>> Hello Shahab,
>>>>
>>>> I think CassandraAwareHiveContext
>>>> <https://github.com/tuplejump/calliope/blob/develop/sql/hive/src/main/scala/org/apache/spark/sql/hive/CassandraAwareHiveContext.scala>
>>>>  in
>>>> Calliopee is what you are looking for. Create CAHC instance and you should
>>>> be able to run hive functions against the SchemaRDD you create from there.
>>>>
>>>> Cheers,
>>>> Rohit
>>>>
>>>> *Founder & CEO, **Tuplejump, Inc.*
>>>> ____________________________
>>>> www.tuplejump.com
>>>> *The Data Engineering Platform*
>>>>
>>>> On Tue, Mar 3, 2015 at 6:03 AM, Cheng, Hao <hao.ch...@intel.com> wrote:
>>>>
>>>>>  The temp table in metastore can not be shared cross SQLContext
>>>>> instances, since HiveContext is a sub class of SQLContext (inherits all of
>>>>> its functionality), why not using a single HiveContext globally? Is there
>>>>> any specific requirement in your case that you need multiple
>>>>> SQLContext/HiveContext?
>>>>>
>>>>>
>>>>>
>>>>> *From:* shahab [mailto:shahab.mok...@gmail.com]
>>>>> *Sent:* Tuesday, March 3, 2015 9:46 PM
>>>>>
>>>>> *To:* Cheng, Hao
>>>>> *Cc:* user@spark.apache.org
>>>>> *Subject:* Re: Supporting Hive features in Spark SQL Thrift JDBC
>>>>> server
>>>>>
>>>>>
>>>>>
>>>>> You are right ,  CassandraAwareSQLContext is subclass of SQL context.
>>>>>
>>>>>
>>>>>
>>>>> But I did another experiment, I queried Cassandra
>>>>> using CassandraAwareSQLContext, then I registered the "rdd" as a temp 
>>>>> table
>>>>> , next I tried to query it using HiveContext, but it seems that hive
>>>>> context can not see the registered table suing SQL context. Is this a
>>>>> normal case?
>>>>>
>>>>>
>>>>>
>>>>> best,
>>>>>
>>>>> /Shahab
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Mar 3, 2015 at 1:35 PM, Cheng, Hao <hao.ch...@intel.com>
>>>>> wrote:
>>>>>
>>>>>  Hive UDF are only applicable for HiveContext and its subclass
>>>>> instance, is the CassandraAwareSQLContext a direct sub class of
>>>>> HiveContext or SQLContext?
>>>>>
>>>>>
>>>>>
>>>>> *From:* shahab [mailto:shahab.mok...@gmail.com]
>>>>> *Sent:* Tuesday, March 3, 2015 5:10 PM
>>>>> *To:* Cheng, Hao
>>>>> *Cc:* user@spark.apache.org
>>>>> *Subject:* Re: Supporting Hive features in Spark SQL Thrift JDBC
>>>>> server
>>>>>
>>>>>
>>>>>
>>>>>   val sc: SparkContext = new SparkContext(conf)
>>>>>
>>>>>   val sqlCassContext = new CassandraAwareSQLContext(sc)  // I used
>>>>> some Calliope Cassandra Spark connector
>>>>>
>>>>> val rdd : SchemaRDD  = sqlCassContext.sql("select * from db.profile " )
>>>>>
>>>>> rdd.cache
>>>>>
>>>>> rdd.registerTempTable("profile")
>>>>>
>>>>>  rdd.first  //enforce caching
>>>>>
>>>>>      val q = "select  from_unixtime(floor(createdAt/1000)) from
>>>>> profile where sampling_bucket=0 "
>>>>>
>>>>>      val rdd2 = rdd.sqlContext.sql(q )
>>>>>
>>>>>      println ("Result: " + rdd2.first)
>>>>>
>>>>>
>>>>>
>>>>> And I get the following  errors:
>>>>>
>>>>> xception in thread "main"
>>>>> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved
>>>>> attributes: 'from_unixtime('floor(('createdAt / 1000))) AS c0#7, tree:
>>>>>
>>>>> Project ['from_unixtime('floor(('createdAt / 1000))) AS c0#7]
>>>>>
>>>>>  Filter (sampling_bucket#10 = 0)
>>>>>
>>>>>   Subquery profile
>>>>>
>>>>>    Project
>>>>> [company#8,bucket#9,sampling_bucket#10,profileid#11,createdat#12L,modifiedat#13L,version#14]
>>>>>
>>>>>     CassandraRelation localhost, 9042, 9160, normaldb_sampling,
>>>>> profile, org.apache.spark.sql.CassandraAwareSQLContext@778b692d,
>>>>> None, None, false, Some(Configuration: core-default.xml, core-site.xml,
>>>>> mapred-default.xml, mapred-site.xml)
>>>>>
>>>>>
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:72)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:70)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:183)
>>>>>
>>>>> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>>>>
>>>>> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>>>>>
>>>>> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>>>>>
>>>>> at
>>>>> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
>>>>>
>>>>> at
>>>>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
>>>>>
>>>>> at
>>>>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
>>>>>
>>>>> at scala.collection.TraversableOnce$class.to
>>>>> (TraversableOnce.scala:273)
>>>>>
>>>>> at scala.collection.AbstractIterator.to(Iterator.scala:1157)
>>>>>
>>>>> at
>>>>> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
>>>>>
>>>>> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
>>>>>
>>>>> at
>>>>> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
>>>>>
>>>>> at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:212)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:168)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:156)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:70)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:68)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)
>>>>>
>>>>> at
>>>>> scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)
>>>>>
>>>>> at
>>>>> scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)
>>>>>
>>>>> at
>>>>> scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51)
>>>>>
>>>>> at scala.collection.immutable.List.foreach(List.scala:318)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:402)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:402)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan$lzycompute(SQLContext.scala:403)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan(SQLContext.scala:403)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:407)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:405)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:411)
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:411)
>>>>>
>>>>> at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:438)
>>>>>
>>>>> at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:440)
>>>>>
>>>>> at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:103)
>>>>>
>>>>> at org.apache.spark.rdd.RDD.first(RDD.scala:1091)
>>>>>
>>>>> at boot.SQLDemo$.main(SQLDemo.scala:65)  //my code
>>>>>
>>>>> at boot.SQLDemo.main(SQLDemo.scala)  //my code
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Mar 3, 2015 at 8:57 AM, Cheng, Hao <hao.ch...@intel.com>
>>>>> wrote:
>>>>>
>>>>>  Can you provide the detailed failure call stack?
>>>>>
>>>>>
>>>>>
>>>>> *From:* shahab [mailto:shahab.mok...@gmail.com]
>>>>> *Sent:* Tuesday, March 3, 2015 3:52 PM
>>>>> *To:* user@spark.apache.org
>>>>> *Subject:* Supporting Hive features in Spark SQL Thrift JDBC server
>>>>>
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>>
>>>>>
>>>>> According to Spark SQL documentation, "....Spark SQL supports the
>>>>> vast majority of Hive features, such as  User Defined Functions( UDF) ",
>>>>> and one of these UFDs is "current_date()" function, which should be
>>>>> supported.
>>>>>
>>>>>
>>>>>
>>>>> However, i get error when I am using this UDF in my SQL query. There
>>>>> are couple of other UDFs which cause similar error.
>>>>>
>>>>>
>>>>>
>>>>> Am I missing something in my JDBC server ?
>>>>>
>>>>>
>>>>>
>>>>> /Shahab
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>

Re: Supporting Hive features in Spark SQL Thrift JDBC server

Reply via email to