The Hive dependency comes from spark-hive. It does work with Spark 1.1 we will have the 1.2 release later this month. On Mar 3, 2015 8:49 AM, "shahab" <shahab.mok...@gmail.com> wrote:
> > Thanks Rohit, > > I am already using Calliope and quite happy with it, well done ! except > the fact that : > 1- It seems that it does not support Hive 0.12 or higher, Am i right? for > example you can not use : current_time() UDF, or those new UDFs added in > hive 0.12 . Are they supported? Any plan for supporting them? > 2-It does not support Spark 1.1 and 1.2. Any plan for new release? > > best, > /Shahab > > On Tue, Mar 3, 2015 at 5:41 PM, Rohit Rai <ro...@tuplejump.com> wrote: > >> Hello Shahab, >> >> I think CassandraAwareHiveContext >> <https://github.com/tuplejump/calliope/blob/develop/sql/hive/src/main/scala/org/apache/spark/sql/hive/CassandraAwareHiveContext.scala> >> in >> Calliopee is what you are looking for. Create CAHC instance and you should >> be able to run hive functions against the SchemaRDD you create from there. >> >> Cheers, >> Rohit >> >> *Founder & CEO, **Tuplejump, Inc.* >> ____________________________ >> www.tuplejump.com >> *The Data Engineering Platform* >> >> On Tue, Mar 3, 2015 at 6:03 AM, Cheng, Hao <hao.ch...@intel.com> wrote: >> >>> The temp table in metastore can not be shared cross SQLContext >>> instances, since HiveContext is a sub class of SQLContext (inherits all of >>> its functionality), why not using a single HiveContext globally? Is there >>> any specific requirement in your case that you need multiple >>> SQLContext/HiveContext? >>> >>> >>> >>> *From:* shahab [mailto:shahab.mok...@gmail.com] >>> *Sent:* Tuesday, March 3, 2015 9:46 PM >>> >>> *To:* Cheng, Hao >>> *Cc:* user@spark.apache.org >>> *Subject:* Re: Supporting Hive features in Spark SQL Thrift JDBC server >>> >>> >>> >>> You are right , CassandraAwareSQLContext is subclass of SQL context. >>> >>> >>> >>> But I did another experiment, I queried Cassandra >>> using CassandraAwareSQLContext, then I registered the "rdd" as a temp table >>> , next I tried to query it using HiveContext, but it seems that hive >>> context can not see the registered table suing SQL context. Is this a >>> normal case? >>> >>> >>> >>> best, >>> >>> /Shahab >>> >>> >>> >>> >>> >>> On Tue, Mar 3, 2015 at 1:35 PM, Cheng, Hao <hao.ch...@intel.com> wrote: >>> >>> Hive UDF are only applicable for HiveContext and its subclass >>> instance, is the CassandraAwareSQLContext a direct sub class of >>> HiveContext or SQLContext? >>> >>> >>> >>> *From:* shahab [mailto:shahab.mok...@gmail.com] >>> *Sent:* Tuesday, March 3, 2015 5:10 PM >>> *To:* Cheng, Hao >>> *Cc:* user@spark.apache.org >>> *Subject:* Re: Supporting Hive features in Spark SQL Thrift JDBC server >>> >>> >>> >>> val sc: SparkContext = new SparkContext(conf) >>> >>> val sqlCassContext = new CassandraAwareSQLContext(sc) // I used some >>> Calliope Cassandra Spark connector >>> >>> val rdd : SchemaRDD = sqlCassContext.sql("select * from db.profile " ) >>> >>> rdd.cache >>> >>> rdd.registerTempTable("profile") >>> >>> rdd.first //enforce caching >>> >>> val q = "select from_unixtime(floor(createdAt/1000)) from profile >>> where sampling_bucket=0 " >>> >>> val rdd2 = rdd.sqlContext.sql(q ) >>> >>> println ("Result: " + rdd2.first) >>> >>> >>> >>> And I get the following errors: >>> >>> xception in thread "main" >>> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved >>> attributes: 'from_unixtime('floor(('createdAt / 1000))) AS c0#7, tree: >>> >>> Project ['from_unixtime('floor(('createdAt / 1000))) AS c0#7] >>> >>> Filter (sampling_bucket#10 = 0) >>> >>> Subquery profile >>> >>> Project >>> [company#8,bucket#9,sampling_bucket#10,profileid#11,createdat#12L,modifiedat#13L,version#14] >>> >>> CassandraRelation localhost, 9042, 9160, normaldb_sampling, profile, >>> org.apache.spark.sql.CassandraAwareSQLContext@778b692d, None, None, >>> false, Some(Configuration: core-default.xml, core-site.xml, >>> mapred-default.xml, mapred-site.xml) >>> >>> >>> >>> at >>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:72) >>> >>> at >>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$1.applyOrElse(Analyzer.scala:70) >>> >>> at >>> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165) >>> >>> at >>> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:183) >>> >>> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) >>> >>> at scala.collection.Iterator$class.foreach(Iterator.scala:727) >>> >>> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) >>> >>> at >>> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) >>> >>> at >>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) >>> >>> at >>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) >>> >>> at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) >>> >>> at scala.collection.AbstractIterator.to(Iterator.scala:1157) >>> >>> at >>> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) >>> >>> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) >>> >>> at >>> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) >>> >>> at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) >>> >>> at >>> org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:212) >>> >>> at >>> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:168) >>> >>> at >>> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:156) >>> >>> at >>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:70) >>> >>> at >>> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:68) >>> >>> at >>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61) >>> >>> at >>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59) >>> >>> at >>> scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51) >>> >>> at >>> scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60) >>> >>> at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34) >>> >>> at >>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59) >>> >>> at >>> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51) >>> >>> at scala.collection.immutable.List.foreach(List.scala:318) >>> >>> at >>> org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51) >>> >>> at >>> org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:402) >>> >>> at >>> org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:402) >>> >>> at >>> org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan$lzycompute(SQLContext.scala:403) >>> >>> at >>> org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan(SQLContext.scala:403) >>> >>> at >>> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:407) >>> >>> at >>> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:405) >>> >>> at >>> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:411) >>> >>> at >>> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:411) >>> >>> at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:438) >>> >>> at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:440) >>> >>> at org.apache.spark.sql.SchemaRDD.take(SchemaRDD.scala:103) >>> >>> at org.apache.spark.rdd.RDD.first(RDD.scala:1091) >>> >>> at boot.SQLDemo$.main(SQLDemo.scala:65) //my code >>> >>> at boot.SQLDemo.main(SQLDemo.scala) //my code >>> >>> >>> >>> On Tue, Mar 3, 2015 at 8:57 AM, Cheng, Hao <hao.ch...@intel.com> wrote: >>> >>> Can you provide the detailed failure call stack? >>> >>> >>> >>> *From:* shahab [mailto:shahab.mok...@gmail.com] >>> *Sent:* Tuesday, March 3, 2015 3:52 PM >>> *To:* user@spark.apache.org >>> *Subject:* Supporting Hive features in Spark SQL Thrift JDBC server >>> >>> >>> >>> Hi, >>> >>> >>> >>> According to Spark SQL documentation, "....Spark SQL supports the vast >>> majority of Hive features, such as User Defined Functions( UDF) ", and one >>> of these UFDs is "current_date()" function, which should be supported. >>> >>> >>> >>> However, i get error when I am using this UDF in my SQL query. There are >>> couple of other UDFs which cause similar error. >>> >>> >>> >>> Am I missing something in my JDBC server ? >>> >>> >>> >>> /Shahab >>> >>> >>> >>> >>> >> >> >