Re: Error using ORC Format with Hive

Amit Tewari Sat, 05 Apr 2014 00:49:25 -0700

Thanks for the reply. I did solve protobuf issue by upgrading to 2.5 but then 
hive 0.12 also started showing the same issue as 0.13 and 0.14


I was working through  cli

Turns out issue was due to space available (not) to data node. Let me elaborate 
for others in the list. 

I had about 2GB available on the partition where data node directory was 
configured (the name node and data node space was on the same directory tree 
but different directories, off course). I inserted kv1.txt (few KBs) to table#1 
(stored as textfile) and then tried to "insert into table#2 select * table#1". 
Table#2 was stored as Orc.  It was difficult for me to guess that converted Orc 
data would be too big to fit in 2GB.  Especially when data node logs did not 
have any error. Nor was there reserve configured for HDFS. I still don't know 
why it needs so much space however I could reproduce the error simply by 
pushing a 300MB file to HDFS "hdfs dfs -put ". Thus realizing that it's a space 
issue. Migrated datanode  to a bigger partition and everything is fine now. 

On a separate note I am not seeing any significant query time improvement by 
pushing data into ORC. About 25% yeah but no where close to multiples I was 
hoping. I changed the striping to 4MB. Tried creating index every 10k rows. 
Inserted 6 million rows and did many different type of queries. Any ideas 
people what I might be missing  ? 

Amit 

Sent from my mobile device, please excuse the typos

> On Apr 4, 2014, at 8:21 PM, Bryan Jeffrey <bryan.jeff...@gmail.com> wrote:
> 
> Amit,
> 
> Are you executing your select for conversion to orc via beeline, or hive cli? 
> From looking at your logs, it appears that you do not have permissions in 
> hdfs to write the resultant orc data. Check permissions in hdfs to ensure 
> that your user has write permissions to write to hive warehouse.
> 
> I forwarded you a previous thread regarding hive 12 protobuf issues.
> 
> Regards,
> 
> Bryan Jeffrey
> 
> On Apr 4, 2014 8:14 PM, "Amit Tewari" <amittew...@gmail.com> wrote:
> I checked out and build hive 0.13. Tried with same results. i.e. 
> eRpcServer.addBlock(NameNodeRpcServer.java:555)
>     at File 
> /tmp/hive-hduser/hive_2014-04-04_20-34-43_550_7470522328893486504-1/_task_tmp.-ext-10002/_tmp.000000_3
>  could only be replicated to 0 nodes instead of minReplication (=1).  There 
> are 1 datanode(s) running and no node(s) are excluded in this     operation.
> 
> 
> 
> I also tried it with the release version of hive 0.12 and that gave me a 
> different error. Related to protobuffer incompatibility (pasted below)
> 
> So at this point I can't run even the basic use case with ORC storage..
> 
> Any pointers would be very helpful.
> 
> Amit
> 
> Error: java.lang.RuntimeException: Hive Runtime Error while closing operators
>     at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:240)
>     at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
> 
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:415)
>     at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
> Caused by: java.lang.UnsupportedOperationException: This is supposed to be 
> overridden by subclasses.
>     at 
> com.google.protobuf.GeneratedMessage.getUnknownFields(GeneratedMessage.java:180)
>     at 
> org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics.getSerializedSize(OrcProto.java:3046)
>     at 
> com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
>     at 
> com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
>     at 
> org.apache.hadoop.hive.ql.io.orc.OrcProto$RowIndexEntry.getSerializedSize(OrcProto.java:4129)
>     at 
> com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
>     at 
> com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
>     at 
> org.apache.hadoop.hive.ql.io.orc.OrcProto$RowIndex.getSerializedSize(OrcProto.java:4641)
>     at 
> com.google.protobuf.AbstractMessageLite.writeTo(AbstractMessageLite.java:75)
>     at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.writeStripe(WriterImpl.java:548)
>     at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.writeStripe(WriterImpl.java:1328)
>     at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1699)
>     at org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:1868)
>     at 
> org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:95)
>     at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:181)
>     at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:866)
>     at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:596)
>     at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613)
>     at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613)
>     at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613)
>     at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:207)
> 
> Amit
> 
> 
>> On 4/4/14 2:28 PM, Amit Tewari wrote:
>> Hi All,
>> 
>> I am just trying to do some simple tests to see speedup in hive query with 
>> Hive 0.14 (trunk version this morning). Just tried to use sample test case 
>> to start with. First wanted to see how much I can speed up using ORC format. 
>> 
>> However for some reason I can't insert data into the table with ORC format. 
>> It fails with Exception "File <filename> could only be replicated to 0 nodes 
>> instead of minReplication (=1).  There are 1 datanode(s) running and no 
>> node(s) are excluded in this operation" 
>> 
>> I can however run inserting data into text table without any issue. 
>> 
>> I have included the step below. 
>> 
>> Any pointers would be appreciated. 
>> 
>> Amit
>> 
>> 
>> 
>> I have a single node setup with minimal settings. JPS output is as follows 
>> $ jps
>> 9823 NameNode
>> 12172 JobHistoryServer
>> 9903 DataNode
>> 14895 Jps
>> 11796 ResourceManager
>> 12034 NodeManager
>> Running Hadoop 0.2.2 with Yarn.
>> 
>> 
>> 
>> Step1
>> 
>> CREATE TABLE pokes (foo INT, bar STRING);
>> 
>> Step 2
>> 
>> LOAD DATA LOCAL INPATH './examples/files/kv1.txt' OVERWRITE INTO TABLE pokes;
>> 
>> Step 3
>> CREATE TABLE pokes_1 (foo INT, bar STRING) 
>> 
>> Step 4
>> 
>> Insert into table pokes_1 select * from pokes;
>> 
>> Step 5.
>> 
>> CREATE TABLE pokes_orc (foo INT, bar STRING) stored as orc;
>> 
>> Step 6. 
>> 
>> insert into pokes_orc select * from pokes; <__FAILED__ with Exception below >
>> 
>> eRpcServer.addBlock(NameNodeRpcServer.java:555)
>>     at File 
>> /tmp/hive-hduser/hive_2014-04-04_20-34-43_550_7470522328893486504-1/_task_tmp.-ext-10002/_tmp.000000_3
>>  could only be replicated to 0 nodes instead of minReplication (=1).  There 
>> are 1 datanode(s) running and no node(s) are excluded in this operation.
>>     at 
>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
>>     at 
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
>>     at 
>> org.apache.hadoop.hdfs.server.namenode.NameNodorg.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387)
>>     at 
>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582)
>>     at 
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>     at java.security.AccessController.doPrivileged(Native Method)
>>     at javax.security.auth.Subject.doAs(Subject.java:415)
>>     at 
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
>> 
>>     at 
>> org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:168)
>>     at 
>> org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:843)
>>     at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:577)
>>     at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
>>     at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
>>     at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
>>     at 
>> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227)
>>     ... 8 more
>> 
>> 
>> Step 7
>> 
>> Insert overwrite table pokes_1 select * from pokes; <Success>
>

Re: Error using ORC Format with Hive

Reply via email to