Thanks for the reply. I did solve protobuf issue by upgrading to 2.5 but then hive 0.12 also started showing the same issue as 0.13 and 0.14
I was working through cli Turns out issue was due to space available (not) to data node. Let me elaborate for others in the list. I had about 2GB available on the partition where data node directory was configured (the name node and data node space was on the same directory tree but different directories, off course). I inserted kv1.txt (few KBs) to table#1 (stored as textfile) and then tried to "insert into table#2 select * table#1". Table#2 was stored as Orc. It was difficult for me to guess that converted Orc data would be too big to fit in 2GB. Especially when data node logs did not have any error. Nor was there reserve configured for HDFS. I still don't know why it needs so much space however I could reproduce the error simply by pushing a 300MB file to HDFS "hdfs dfs -put ". Thus realizing that it's a space issue. Migrated datanode to a bigger partition and everything is fine now. On a separate note I am not seeing any significant query time improvement by pushing data into ORC. About 25% yeah but no where close to multiples I was hoping. I changed the striping to 4MB. Tried creating index every 10k rows. Inserted 6 million rows and did many different type of queries. Any ideas people what I might be missing ? Amit Sent from my mobile device, please excuse the typos > On Apr 4, 2014, at 8:21 PM, Bryan Jeffrey <bryan.jeff...@gmail.com> wrote: > > Amit, > > Are you executing your select for conversion to orc via beeline, or hive cli? > From looking at your logs, it appears that you do not have permissions in > hdfs to write the resultant orc data. Check permissions in hdfs to ensure > that your user has write permissions to write to hive warehouse. > > I forwarded you a previous thread regarding hive 12 protobuf issues. > > Regards, > > Bryan Jeffrey > > On Apr 4, 2014 8:14 PM, "Amit Tewari" <amittew...@gmail.com> wrote: > I checked out and build hive 0.13. Tried with same results. i.e. > eRpcServer.addBlock(NameNodeRpcServer.java:555) > at File > /tmp/hive-hduser/hive_2014-04-04_20-34-43_550_7470522328893486504-1/_task_tmp.-ext-10002/_tmp.000000_3 > could only be replicated to 0 nodes instead of minReplication (=1). There > are 1 datanode(s) running and no node(s) are excluded in this operation. > > > > I also tried it with the release version of hive 0.12 and that gave me a > different error. Related to protobuffer incompatibility (pasted below) > > So at this point I can't run even the basic use case with ORC storage.. > > Any pointers would be very helpful. > > Amit > > Error: java.lang.RuntimeException: Hive Runtime Error while closing operators > at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:240) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162) > > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157) > Caused by: java.lang.UnsupportedOperationException: This is supposed to be > overridden by subclasses. > at > com.google.protobuf.GeneratedMessage.getUnknownFields(GeneratedMessage.java:180) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics.getSerializedSize(OrcProto.java:3046) > at > com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) > at > com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$RowIndexEntry.getSerializedSize(OrcProto.java:4129) > at > com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) > at > com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$RowIndex.getSerializedSize(OrcProto.java:4641) > at > com.google.protobuf.AbstractMessageLite.writeTo(AbstractMessageLite.java:75) > at > org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.writeStripe(WriterImpl.java:548) > at > org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.writeStripe(WriterImpl.java:1328) > at > org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1699) > at org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:1868) > at > org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:95) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:181) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:866) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:596) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:613) > at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:207) > > Amit > > >> On 4/4/14 2:28 PM, Amit Tewari wrote: >> Hi All, >> >> I am just trying to do some simple tests to see speedup in hive query with >> Hive 0.14 (trunk version this morning). Just tried to use sample test case >> to start with. First wanted to see how much I can speed up using ORC format. >> >> However for some reason I can't insert data into the table with ORC format. >> It fails with Exception "File <filename> could only be replicated to 0 nodes >> instead of minReplication (=1). There are 1 datanode(s) running and no >> node(s) are excluded in this operation" >> >> I can however run inserting data into text table without any issue. >> >> I have included the step below. >> >> Any pointers would be appreciated. >> >> Amit >> >> >> >> I have a single node setup with minimal settings. JPS output is as follows >> $ jps >> 9823 NameNode >> 12172 JobHistoryServer >> 9903 DataNode >> 14895 Jps >> 11796 ResourceManager >> 12034 NodeManager >> Running Hadoop 0.2.2 with Yarn. >> >> >> >> Step1 >> >> CREATE TABLE pokes (foo INT, bar STRING); >> >> Step 2 >> >> LOAD DATA LOCAL INPATH './examples/files/kv1.txt' OVERWRITE INTO TABLE pokes; >> >> Step 3 >> CREATE TABLE pokes_1 (foo INT, bar STRING) >> >> Step 4 >> >> Insert into table pokes_1 select * from pokes; >> >> Step 5. >> >> CREATE TABLE pokes_orc (foo INT, bar STRING) stored as orc; >> >> Step 6. >> >> insert into pokes_orc select * from pokes; <__FAILED__ with Exception below > >> >> eRpcServer.addBlock(NameNodeRpcServer.java:555) >> at File >> /tmp/hive-hduser/hive_2014-04-04_20-34-43_550_7470522328893486504-1/_task_tmp.-ext-10002/_tmp.000000_3 >> could only be replicated to 0 nodes instead of minReplication (=1). There >> are 1 datanode(s) running and no node(s) are excluded in this operation. >> at >> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384) >> at >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477) >> at >> org.apache.hadoop.hdfs.server.namenode.NameNodorg.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387) >> at >> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582) >> at >> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) >> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) >> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048) >> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:415) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) >> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042) >> >> at >> org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:168) >> at >> org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:843) >> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:577) >> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588) >> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588) >> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588) >> at >> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227) >> ... 8 more >> >> >> Step 7 >> >> Insert overwrite table pokes_1 select * from pokes; <Success> >