I was using different build of spark compiled with different version of hive before
I error which I see now org.apache.hadoop.hive.serde2.avro.BadSchemaException at org.apache.hadoop.hive.serde2.avro.AvroSerDe.deserialize(AvroSerDe.java:195) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:321) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:320) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$6.apply(Aggregate.scala:128) at org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$6.apply(Aggregate.scala:124) at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634) at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) On Thu, Aug 27, 2015 at 10:38 AM, java8964 <java8...@hotmail.com> wrote: > You can run hive query in the spark-avro, but you cannot query the hive > view in the spark-avro, as the view is stored in the Hive metadata. > > What do you mean the right version of spark, then "can't determine table > schema" problem is fixed? I faced this problem before, and my guess is the > Hive library mismatch causing it, but not sure. > > I never faced your 2nd problem, can you post the whole stack for that > error? > > Most of our datasets are also in AVRO format. > > Yong > > ------------------------------ > Date: Thu, 27 Aug 2015 09:45:45 -0700 > Subject: Re: query avro hive table in spark sql > From: gpatc...@gmail.com > To: java8...@hotmail.com > CC: mich...@databricks.com; user@spark.apache.org > > > can we run hive queries using spark-avro ? > > In our case its not just reading the avro file. we have view in hive which > is based on multiple tables. > > On Thu, Aug 27, 2015 at 9:41 AM, Giri P <gpatc...@gmail.com> wrote: > > we are using hive1.1 . > > I was able to fix below error when I used right version spark > > 15/08/26 17:51:12 WARN avro.AvroSerdeUtils: Encountered AvroSerdeException > determining schema. Returning signal schema to indicate problem > org.apache.hadoop.hive.serde2.avro.AvroSerdeException: Neither > avro.schema.literal nor avro.schema.url specified, can't determine table > schema > at > org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils. > determineSchemaOrThrowException(AvroSerdeUtils.java:68) > at > org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils. > determineSchemaOrReturnErrorSchema(AvroSerdeUtils.java:93) > at > org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:60) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer( > MetaStoreUtils.java:375) > at > org.apache.hadoop.hive.ql.metadata.Partition.getDeserializer(Partition. > java:249) > > > > But I still see this error when querying on some hive avro tables. > > 15/08/26 17:51:27 WARN scheduler.TaskSetManager: Lost task 30.0 in stage > 0.0 (TID 14, dtord01hdw0227p.dc.dotomi.net): > org.apache.hadoop.hive.serde2.avro.BadSchemaException > > at > org.apache.hadoop.hive.serde2.avro.AvroSerDe.deserialize(AvroSerDe.java:91) > > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:321) > > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:320) > > I haven't tried spark-avro. We are using Sqlcontext to run queries in our > application > > Any idea if this issue might be coz of querying across different schema > version of data ? > > Thanks > Giri > > On Thu, Aug 27, 2015 at 5:39 AM, java8964 <java8...@hotmail.com> wrote: > > What version of the Hive you are using? And do you compile to the right > version of Hive when you compiled Spark? > > BTY, spark-avro works great for our experience, but still, some non-tech > people just want to use as a SQL shell in spark, like HIVE-CLI. > > Yong > > ------------------------------ > From: mich...@databricks.com > Date: Wed, 26 Aug 2015 17:48:44 -0700 > Subject: Re: query avro hive table in spark sql > To: gpatc...@gmail.com > CC: user@spark.apache.org > > > I'd suggest looking at > http://spark-packages.org/package/databricks/spark-avro > > On Wed, Aug 26, 2015 at 11:32 AM, gpatcham <gpatc...@gmail.com> wrote: > > Hi, > > I'm trying to query hive table which is based on avro in spark SQL and > seeing below errors. > > 15/08/26 17:51:12 WARN avro.AvroSerdeUtils: Encountered AvroSerdeException > determining schema. Returning signal schema to indicate problem > org.apache.hadoop.hive.serde2.avro.AvroSerdeException: Neither > avro.schema.literal nor avro.schema.url specified, can't determine table > schema > at > > org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.determineSchemaOrThrowException(AvroSerdeUtils.java:68) > at > > org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.determineSchemaOrReturnErrorSchema(AvroSerdeUtils.java:93) > at > org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:60) > at > > org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:375) > at > > org.apache.hadoop.hive.ql.metadata.Partition.getDeserializer(Partition.java:249) > > > Its not able to determine schema. Hive table is pointing to avro schema > using url. I'm stuck and couldn't find more info on this. > > Any pointers ? > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/query-avro-hive-table-in-spark-sql-tp24462.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > > > >