Hi,

I'm using Tez with Hive to query data on S3 and I notice the following two
cases.

*Case A*

When the query is covering a smaller amount of data a TEZ job (yarn
application) is not created

select dt from my_db_schema.my_table where dt in
('2018-03-10','2018-03-09') and header ='xxx';

The output in the above case is:

OK
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further
details.
2018-03-10
2018-03-10
2018-03-09
2018-03-09
Time taken: 7.043 seconds, Fetched: 4 row(s)


*Case B*

When the query is scanning more data

select dt from my_db_schema.my_table where  header ='xxx';

then the output is as follows and I can see a TEZ job logged in the TEZ ui
and in yarn.

----------------------------------------------------------------------------------------------
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING
PENDING  FAILED  KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container     SUCCEEDED     22         22        0
0       0       0
----------------------------------------------------------------------------------------------
VERTICES: 01/01  [==========================>>] 100%  ELAPSED TIME: 38.12 s
----------------------------------------------------------------------------------------------
OK
2018-03-05
2018-03-05
2018-03-06
2018-03-06
2018-03-07
2018-03-07
2018-03-08
2018-03-08
2018-03-09
2018-03-09
2018-03-10
2018-03-10
2018-03-25
2018-03-25
2018-03-26
2018-03-26
2018-03-28
2018-03-28
2018-05-09
2018-05-09
2018-05-10
2018-05-10
Time taken: 47.197 seconds, Fetched: 22 row(s)

The problem in case A is that sometimes Hive decides not to trigger a TEZ
job and the query is taking a long time to complete. In this case the
worker nodes are not utilised at all, it's only the master node executing
the query.

Is there a way to force Hive to always trigger a TEZ job?

Reply via email to