Tez job submissions failing when cluster is under provisioned..

Gautam Thu, 10 Mar 2016 17:34:22 -0800

Hello,

Ran into this today.. We'r seeing Tez jobs failing to submit when cluster
is under high load. In particular, the split calculation seems to fall over
when it sees # slots <0. This seems to be something YARN fair-scheduler
reporting it this way.. although Tez doesn't seem to handle.


Vertex failed, vertexName=Map 1, vertexId=vertex_1457029908268_101939_1_00,
diagnostics=[Vertex vertex_1457029908268_101939_1_00 [Map 1] killed/failed
due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: upsight_clean_aggregate_data
initializer failed, vertex=vertex_1457029908268_101939_1_00 [Map 1], java.
lang.IllegalArgumentException: Illegal Capacity: -135

        at java.util.ArrayList.<init>(ArrayList.java:142)

        at org.apache.hadoop.mapred.FileInputFormat.getSplits(
FileInputFormat.java:330)

        at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(
HiveInputFormat.java:306)

        at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(
HiveInputFormat.java:408)

        at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(
HiveSplitGenerator.java:129)




 I did come across HIVE-12957, in which the fix patch seems to only report
the error better instead of doing anything about it.

Now comes my question, is this in an expected failure case ? Is there a bug
I should know about in YARN scheduling or am I misunderstanding the issue?
It seems rather frivolous on Tez's part to give up when the cluster is
under high load instead of just defaulting to some sane default and adding
tasks to the queue.


-Gautam.

Tez job submissions failing when cluster is under provisioned..

Reply via email to