Hello, Ran into this today.. We'r seeing Tez jobs failing to submit when cluster is under high load. In particular, the split calculation seems to fall over when it sees # slots <0. This seems to be something YARN fair-scheduler reporting it this way.. although Tez doesn't seem to handle.
Vertex failed, vertexName=Map 1, vertexId=vertex_1457029908268_101939_1_00, diagnostics=[Vertex vertex_1457029908268_101939_1_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: upsight_clean_aggregate_data initializer failed, vertex=vertex_1457029908268_101939_1_00 [Map 1], java. lang.IllegalArgumentException: Illegal Capacity: -135 at java.util.ArrayList.<init>(ArrayList.java:142) at org.apache.hadoop.mapred.FileInputFormat.getSplits( FileInputFormat.java:330) at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup( HiveInputFormat.java:306) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits( HiveInputFormat.java:408) at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize( HiveSplitGenerator.java:129) I did come across HIVE-12957, in which the fix patch seems to only report the error better instead of doing anything about it. Now comes my question, is this in an expected failure case ? Is there a bug I should know about in YARN scheduling or am I misunderstanding the issue? It seems rather frivolous on Tez's part to give up when the cluster is under high load instead of just defaulting to some sane default and adding tasks to the queue. -Gautam.