Hi, I'm using hive 2.0.1 with Tez 0.9.1. In a few cases, when I am querying an orc table, I get the following error -
Status: Failed Vertex failed, vertexName=Map 1, vertexId=vertex_1586321981777_24335_1_00, diagnostics=[Vertex vertex_1586321981777_24335_1_00 [Map 1] killed/failed due to:INIT_FAILURE, Fail to create InputInitializerManager, org.apache.tez.dag.api.TezReflectionException: Unable to instantiate class with 1 arguments: org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator at org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:71) ... Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit. at com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110) at com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:755) at com.google.protobuf.CodedInputStream.isAtEnd(CodedInputStream.java:701) at com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:99) at org.apache.tez.dag.api.records.DAGProtos$ConfigurationProto.<init>(DAGProtos.java:19294) at org.apache.tez.dag.api.records.DAGProtos$ConfigurationProto.<init>(DAGProtos.java:19258) at org.apache.tez.dag.api.records.DAGProtos$ConfigurationProto$1.parsePartialFrom(DAGProtos.java:19360) at org.apache.tez.dag.api.records.DAGProtos$ConfigurationProto$1.parsePartialFrom(DAGProtos.java:19355) at com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200) at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:217) at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:223) at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49) at org.apache.tez.dag.api.records.DAGProtos$ConfigurationProto.parseFrom(DAGProtos.java:19552) at org.apache.tez.common.TezUtils.createConfFromByteString(TezUtils.java:116) at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.<init>(HiveSplitGenerator.java:92) The table has 32 buckets, and before reading from it I am setting SET tez.grouping.split-count = 32 . I don't understand why the ConfigurationProto object is growing so large as to exceed the Protobuf limit - can someone shed some light on this ? And is there some resolution for this other than modifying our Tez build by explicitly setting CodedInputStream.setSizeLimit() ? Thanks, Rahul Chhiber
