Hive 0.11.0 | Issue with ORC Tables

Savant, Keshav Thu, 19 Sep 2013 05:05:45 -0700

Hi All,

We have setup apache "hive 0.11.0" services on Hadoop cluster (apache version 
0.20.203.0). Hive is showing expected results when tables are stored as 
TextFile.
Whereas, Hive 0.11.0's new feature ORC(Optimized Row Columnar) is throwing an 
exception while running a select query, when we run select queries on tables 
stored as "ORC".
Stacktrace of the exception :


2013-09-19 20:33:38,095 ERROR CliDriver (SessionState.java:printError(386)) - 
Failed with exception 
java.io.IOException:com.google.protobuf.InvalidProtocolBufferException: While 
parsing a protocol message, the input ended unexpectedly in the middle of a 
field.  This could mean either than the input has been truncated or that an 
embedded message misreported its own length.
java.io.IOException: com.google.protobuf.InvalidProtocolBufferException: While 
parsing a protocol message, the input ended unexpectedly in the middle of a 
field.  This could mean either than the input has been truncated or that an 
embedded message misreported its own length.
        at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:544)
        at 
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:488)
        at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:136)
        at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1412)
        at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:271)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: com.google.protobuf.InvalidProtocolBufferException: While parsing a 
protocol message, the input ended unexpectedly in the middle of a field.  This 
could mean either than the input has been truncated or that an embedded message 
misreported its own length.
        at 
com.google.protobuf.InvalidProtocolBufferException.truncatedMessage(InvalidProtocolBufferException.java:49)
        at 
com.google.protobuf.CodedInputStream.readRawBytes(CodedInputStream.java:754)
        at 
com.google.protobuf.CodedInputStream.readBytes(CodedInputStream.java:294)
        at 
com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom(UnknownFieldSet.java:484)
        at 
com.google.protobuf.GeneratedMessage$Builder.parseUnknownField(GeneratedMessage.java:438)
        at 
org.apache.hadoop.hive.ql.io.orc.OrcProto$PostScript$Builder.mergeFrom(OrcProto.java:10129)
        at 
org.apache.hadoop.hive.ql.io.orc.OrcProto$PostScript$Builder.mergeFrom(OrcProto.java:9993)
        at 
com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:300)
        at 
org.apache.hadoop.hive.ql.io.orc.OrcProto$PostScript.parseFrom(OrcProto.java:9970)
        at 
org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:193)
        at 
org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:56)
        at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:168)
        at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:432)
        at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:508)


We did following steps that leads to above exception:

*         SET mapred.output.compression.codec= 
org.apache.hadoop.io.compress.SnappyCodec;

*         CREATE TABLE person(id INT, name STRING) ROW FORMAT DELIMITED FIELDS 
TERMINATED BY ' ' STORED AS ORC tblproperties ("orc.compress"="Snappy");

*         LOAD DATA LOCAL INPATH 'test.txt' INTO TABLE person;

*         Executing  : SELECT * FROM person;
Results :

Failed with exception 
java.io.IOException:com.google.protobuf.InvalidProtocolBufferException: While 
parsing a protocol message, the input ended unexpectedly in the middle of a 
field.  This could mean either than the input has been truncated or that an 
embedded message misreported its own length.



Also, we included codec property in core-site.xml in our hadoop cluster with 
other configuration settings.
<property>
     <name>io.compression.codecs</name>
    <value>org.apache.hadoop.io.compress.SnappyCodec</value>

</property>



Following are the new jars with their placements



1.       Placed a new jar at $HIVE_HOME/lib/config-1.0.0.jar

2.       Placed a new jar for metastore connection 
$HIVE_HOME/lib/mysql-connector-java-5.1.17-bin.jar

3.       Moved jackson-core-asl-1.8.8.jar from $HIVE_HOME/lib to 
$HADOOP_HOME/lib

4.       Moved jackson-mapper-asl-1.8.8.jar from $HIVE_HOME/lib to 
$HADOOP_HOME/lib



Please suggest the possible cause and solution to overcome this issue we are 
facing with ORC format tables.



Thanks,

Keshav

_____________
The information contained in this message is proprietary and/or confidential. 
If you are not the intended recipient, please: (i) delete the message and all 
copies; (ii) do not disclose, distribute or use the message in any manner; and 
(iii) notify the sender immediately. In addition, please be aware that any 
message addressed to our domain is subject to archiving and review by persons 
other than the intended recipient. Thank you.

Hive 0.11.0 | Issue with ORC Tables

Reply via email to