Re: Spark SQL is not returning records for HIVE transactional tables on HDP

Mich Talebzadeh Sat, 12 Mar 2016 09:42:49 -0800

This is an interesting one as it appears that a hive transactional table

   1. Hive version 2
   2. Hive on Spark engine 1.3.1
   3. Spark 1.5.2

hive> create table default.foo(id int) clustered by (id) into 2 buckets
STORED AS ORC TBLPROPERTIES ('transactional'='true');
hive> insert into default.foo values(10);

hive> select * from foo;
OK
10
Time taken: 0.067 seconds, Fetched: 1 row(s)
hive> select * from foo;
10

At this stage if you do a simple select from spark from foo, you will get
an error which sounds like a big

spark-sql> select * from foo;
16/03/12 17:08:21 ERROR SparkSQLDriver: Failed in [select * from foo]
java.lang.RuntimeException: serious problem

 No locks are held in Hive on that table. Let us go back and do a
compaction in Hive

hive> alter table foo compact 'major';
Compaction enqueued.

These messages appear in Hive log. The job is a Map-reduce job

2016-03-12T17:12:29,776 INFO  [rhes564-31]: mapreduce.Job
(Job.java:monitorAndPrintJob(1345)) - Running job: job_1457790020440_0006
2016-03-12T17:12:31,915 INFO
[org.apache.hadoop.hive.ql.txn.compactor.HouseKeeperServiceBase$1-0]:
txn.AcidHouseKeeperService (AcidHouseKeeperService.java:run(67)) - timeout
reaper ran for 0seconds.  isAliveCounter=-2147483542
2016-03-12T17:13:51,918 INFO
[org.apache.hadoop.hive.ql.txn.compactor.HouseKeeperServiceBase$1-0]:
txn.AcidCompactionHistoryService
(AcidCompactionHistoryService.java:run(76)) - History reaper reaper ran for
0seconds.  isAliveCounter=-2147483488

And it goes through every single table to compact it including temp tables

2016-03-12T17:15:52,440 INFO  [Thread-9]: compactor.Initiator
(Initiator.java:run(89)) - Checking to see if we should compact default.foo
2016-03-12T17:15:52,449 INFO  [Thread-9]: compactor.Initiator
(Initiator.java:run(89)) - Checking to see if we should compact
oraclehadoop.sales3
2016-03-12T17:15:52,468 INFO  [Thread-9]: compactor.Initiator
(Initiator.java:run(89)) - Checking to see if we should compact
oraclehadoop.smallsales
2016-03-12T17:15:52,480 INFO  [Thread-9]: compactor.Initiator
(Initiator.java:run(89)) - Checking to see if we should compact test.stg_t2
2016-03-12T17:15:52,491 INFO  [Thread-9]: compactor.Initiator
(Initiator.java:run(89)) - Checking to see if we should compact
default.values__tmp__table__3
2016-03-12T17:15:52,492 INFO  [Thread-9]: compactor.Initiator
(Initiator.java:run(94)) - Can't find table default.values__tmp__table__3,
assuming it's a temp table or has been dropped and moving on.
2016-03-12T17:15:52,492 INFO  [Thread-9]: compactor.Initiator
(Initiator.java:run(89)) - Checking to see if we should compact
default.values__tmp__table__4
2016-03-12T17:15:52,492 INFO  [Thread-9]: compactor.Initiator
(Initiator.java:run(94)) - Can't find table default.values__tmp__table__4,
assuming it's a temp table or has been dropped and moving on.
2016-03-12T17:15:52,493 INFO  [Thread-9]: compactor.Initiator
(Initiator.java:run(89)) - Checking to see if we should compact
default.values__tmp__table__1
2016-03-12T17:15:52,493 INFO  [Thread-9]: compactor.Initiator
(Initiator.java:run(94)) - Can't find table default.values__tmp__table__1,
assuming it's a temp table or has been dropped and moving on.
2016-03-12T17:15:52,493 INFO  [Thread-9]: compactor.Initiator
(Initiator.java:run(89)) - Checking to see if we should compact test.t2
2016-03-12T17:15:52,504 INFO  [Thread-9]: compactor.Initiator
(Initiator.java:run(89)) - Checking to see if we should compact
default.values__tmp__table__2
2016-03-12T17:15:52,505 INFO  [Thread-9]: compactor.Initiator
(Initiator.java:run(94)) - Can't find table default.values__tmp__table__2,
assuming it's a temp table or has been dropped and moving on.

OK once the compaction (which Hive does it in background) is complete then
one can query the table from Spark

spark-sql> select * from foo;
10
Time taken: 4.509 seconds, Fetched 1 row(s)

I notice that if you insert a new row into foo (from Hive), you still get
the same error in Spark

scala> HiveContext.sql("select * from foo").collect.foreach(println)
java.lang.RuntimeException: serious problem

This looks like a bug as irt seems it only works after compaction is done
interactively or after Hive does it itself!

HTH

Dr Mich Talebzadeh

LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*

http://talebzadehmich.wordpress.com

On 12 March 2016 at 08:24, @Sanjiv Singh <sanjiv.is...@gmail.com> wrote:

> Hi All,
>
> I am facing this issue on HDP setup on which COMPACTION is required only
> once for transactional tables to fetch records with Spark SQL.
> On the other hand, Apache setup doesn't required compaction even once.
>
> May be something got triggered on meta-store after compaction, Spark SQL
> start recognizing delta files.
>
> Let know me if needed other details to get root cause.
>
> Try this,
>
> *See complete scenario :*
>
> hive> create table default.foo(id int) clustered by (id) into 2 buckets
> STORED AS ORC TBLPROPERTIES ('transactional'='true');
> hive> insert into default.foo values(10);
>
> scala> sqlContext.table("default.foo").count // Gives 0, which is wrong
> because data is still in delta files
>
> Now run major compaction:
>
> hive> ALTER TABLE default.foo COMPACT 'MAJOR';
>
> scala> sqlContext.table("default.foo").count // Gives 1
>
> hive> insert into foo values(20);
>
> scala> sqlContext.table("default.foo").count* // Gives 2 , no compaction
> required.*
>
>
>
>
> Regards
> Sanjiv Singh
> Mob :  +091 9990-447-339
>

Re: Spark SQL is not returning records for HIVE transactional tables on HDP

Reply via email to