Re: query avro hive table in spark sql

Giri P Thu, 27 Aug 2015 11:16:32 -0700

I was using different build of spark compiled with different version of
hive before


I error which I see now

org.apache.hadoop.hive.serde2.avro.BadSchemaException
        at
org.apache.hadoop.hive.serde2.avro.AvroSerDe.deserialize(AvroSerDe.java:195)
        at
org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:321)
        at
org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:320)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        at
org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$6.apply(Aggregate.scala:128)
        at
org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$6.apply(Aggregate.scala:124)
        at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
        at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
        at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
        at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
        at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
        at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
        at org.apache.spark.scheduler.Task.run(Task.scala:64)
        at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
        at org.apache.spark.scheduler.DAGScheduler.org
$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203)
        at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
        at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191)
        at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191)
        at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
        at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
        at scala.Option.foreach(Option.scala:236)
        at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693)
        at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393)
        at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)


On Thu, Aug 27, 2015 at 10:38 AM, java8964 <java8...@hotmail.com> wrote:

> You can run hive query in the spark-avro, but you cannot query the hive
> view in the spark-avro, as the view is stored in the Hive metadata.
>
> What do you mean the right version of spark, then "can't determine table
> schema" problem is fixed? I faced this problem before, and my guess is the
> Hive library mismatch causing it, but not sure.
>
> I never faced your 2nd problem, can you post the whole stack for that
> error?
>
> Most of our datasets are also in AVRO format.
>
> Yong
>
> ------------------------------
> Date: Thu, 27 Aug 2015 09:45:45 -0700
> Subject: Re: query avro hive table in spark sql
> From: gpatc...@gmail.com
> To: java8...@hotmail.com
> CC: mich...@databricks.com; user@spark.apache.org
>
>
> can we run hive queries using spark-avro ?
>
> In our case its not just reading the avro file. we have view in hive which
> is based on multiple tables.
>
> On Thu, Aug 27, 2015 at 9:41 AM, Giri P <gpatc...@gmail.com> wrote:
>
> we are using hive1.1 .
>
> I was able to fix below error when I used right version spark
>
> 15/08/26 17:51:12 WARN avro.AvroSerdeUtils: Encountered AvroSerdeException
> determining schema. Returning signal schema to indicate problem
> org.apache.hadoop.hive.serde2.avro.AvroSerdeException: Neither
> avro.schema.literal nor avro.schema.url specified, can't determine table
> schema
>         at
> org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.
> determineSchemaOrThrowException(AvroSerdeUtils.java:68)
>         at
> org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.
> determineSchemaOrReturnErrorSchema(AvroSerdeUtils.java:93)
>         at
> org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:60)
>         at
> org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(
> MetaStoreUtils.java:375)
>         at
> org.apache.hadoop.hive.ql.metadata.Partition.getDeserializer(Partition.
> java:249)
>
>
>
> But I still see this error when querying on some hive avro tables.
>
> 15/08/26 17:51:27 WARN scheduler.TaskSetManager: Lost task 30.0 in stage
> 0.0 (TID 14, dtord01hdw0227p.dc.dotomi.net):
> org.apache.hadoop.hive.serde2.avro.BadSchemaException
>
>         at
> org.apache.hadoop.hive.serde2.avro.AvroSerDe.deserialize(AvroSerDe.java:91)
>
>         at
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:321)
>
>        at
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:320)
>
> I haven't tried spark-avro. We are using Sqlcontext to run queries in our
> application
>
> Any idea if this issue might be coz of querying across different schema
> version of data ?
>
> Thanks
> Giri
>
> On Thu, Aug 27, 2015 at 5:39 AM, java8964 <java8...@hotmail.com> wrote:
>
> What version of the Hive you are using? And do you compile to the right
> version of Hive when you compiled Spark?
>
> BTY, spark-avro works great for our experience, but still, some non-tech
> people just want to use as a SQL shell in spark, like HIVE-CLI.
>
> Yong
>
> ------------------------------
> From: mich...@databricks.com
> Date: Wed, 26 Aug 2015 17:48:44 -0700
> Subject: Re: query avro hive table in spark sql
> To: gpatc...@gmail.com
> CC: user@spark.apache.org
>
>
> I'd suggest looking at
> http://spark-packages.org/package/databricks/spark-avro
>
> On Wed, Aug 26, 2015 at 11:32 AM, gpatcham <gpatc...@gmail.com> wrote:
>
> Hi,
>
> I'm trying to query hive table which is based on avro in spark SQL and
> seeing below errors.
>
> 15/08/26 17:51:12 WARN avro.AvroSerdeUtils: Encountered AvroSerdeException
> determining schema. Returning signal schema to indicate problem
> org.apache.hadoop.hive.serde2.avro.AvroSerdeException: Neither
> avro.schema.literal nor avro.schema.url specified, can't determine table
> schema
>         at
>
> org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.determineSchemaOrThrowException(AvroSerdeUtils.java:68)
>         at
>
> org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.determineSchemaOrReturnErrorSchema(AvroSerdeUtils.java:93)
>         at
> org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:60)
>         at
>
> org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:375)
>         at
>
> org.apache.hadoop.hive.ql.metadata.Partition.getDeserializer(Partition.java:249)
>
>
> Its not able to determine schema. Hive table is pointing to avro schema
> using url. I'm stuck and couldn't find more info on this.
>
> Any pointers ?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/query-avro-hive-table-in-spark-sql-tp24462.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>
>
>
>

Re: query avro hive table in spark sql

Reply via email to