Another advantage to the method described by Owen is your process of
creating the ORC file is distributed.  (Rather than precreating the ORC
file off cluster and then moving into the cluster).  This way, you just
push your text files into the cluster, do the select statement and push
into the ORC table. This is also a great opportunity to add partitioning,
sorting, or enrich the data at load time.




On Fri, Sep 20, 2013 at 6:19 AM, Nitin Pawar <nitinpawar...@gmail.com>wrote:

> Keshav,
>
> Owen has provided the solution already. Thats the easiest of the the lot
> and from the master who wrote ORC himself :)
>
> to put it in simple words what he has suggested is,
>
> create a staging table which will be based on default text data format.
> From the staging data load data into a ORC file backed table.
>
> you can refer Owen's mail for the respective queries.
>
>
>
> On Fri, Sep 20, 2013 at 4:46 PM, Savant, Keshav <
> keshav.c.sav...@fisglobal.com> wrote:
>
>>  Hi Nitin,****
>>
>> ** **
>>
>> Thanks for your reply,  we were in an impression that the codec will be
>> responsible for ORC format conversion also.****
>>
>> However as per your reply it seems that a conversion from normal CSV to
>> ORC is required before hive upload.****
>>
>> ** **
>>
>> We got some leads from following URLs****
>>
>> https://cwiki.apache.org/Hive/languagemanual-orc.html****
>>
>> http://www.math.uic.edu/t3m/SnapPy/installing.html****
>>
>> ** **
>>
>> Please suggest how it can be done using some already available libraries,
>> or we need to write our own converter.****
>>
>> ** **
>>
>> Kind Regards,****
>>
>> Keshav****
>>
>> ** **
>>
>> *From:* Nitin Pawar [mailto:nitinpawar...@gmail.com]
>> *Sent:* Thursday, September 19, 2013 5:56 PM
>> *To:* user@hive.apache.org
>> *Subject:* Re: Hive 0.11.0 | Issue with ORC Tables****
>>
>> ** **
>>
>> How did you create "test.txt" as ORC file? ****
>>
>> ** **
>>
>> ** **
>>
>> On Thu, Sep 19, 2013 at 5:34 PM, Savant, Keshav <
>> keshav.c.sav...@fisglobal.com> wrote:****
>>
>> Hi All,****
>>
>>  ****
>>
>> We have setup apache “hive 0.11.0” services on Hadoop cluster (apache
>> version 0.20.203.0). Hive is showing expected results when tables are
>> stored as *TextFile*. ****
>>
>> Whereas, Hive 0.11.0’s new feature ORC(*Optimized Row Columnar*) is
>> throwing an exception while running a select query, when we run select
>> queries on tables stored as “*ORC*”.****
>>
>> Stacktrace of the exception :****
>>
>>  ****
>>
>> 2013-09-19 20:33:38,095 ERROR CliDriver
>> (SessionState.java:printError(386)) - Failed with exception
>> java.io.IOException:com.google.protobuf.InvalidProtocolBufferException:
>> While parsing a protocol message, the input ended unexpectedly in the
>> middle of a field.  This could mean either than the input has been
>> truncated or that an embedded message misreported its own length.****
>>
>> java.io.IOException: com.google.protobuf.InvalidProtocolBufferException:
>> While parsing a protocol message, the input ended unexpectedly in the
>> middle of a field.  This could mean either than the input has been
>> truncated or that an embedded message misreported its own length.****
>>
>>         at
>> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:544)
>> ****
>>
>>         at
>> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:488)
>> ****
>>
>>         at
>> org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:136)****
>>
>>         at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1412)*
>> ***
>>
>>         at
>> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:271)*
>> ***
>>
>>         at
>> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)****
>>
>>         at
>> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)****
>>
>>         at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756)**
>> **
>>
>>         at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)*
>> ***
>>
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)***
>> *
>>
>>         at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> ****
>>
>>         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> ****
>>
>>         at java.lang.reflect.Method.invoke(Method.java:597)****
>>
>>         at org.apache.hadoop.util.RunJar.main(RunJar.java:156)****
>>
>> Caused by: com.google.protobuf.InvalidProtocolBufferException: While
>> parsing a protocol message, the input ended unexpectedly in the middle of a
>> field.  This could mean either than the input has been truncated or that an
>> embedded message misreported its own length.****
>>
>>         at
>> com.google.protobuf.InvalidProtocolBufferException.truncatedMessage(InvalidProtocolBufferException.java:49)
>> ****
>>
>>         at
>> com.google.protobuf.CodedInputStream.readRawBytes(CodedInputStream.java:754)
>> ****
>>
>>         at
>> com.google.protobuf.CodedInputStream.readBytes(CodedInputStream.java:294)
>> ****
>>
>>         at
>> com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom(UnknownFieldSet.java:484)
>> ****
>>
>>         at
>> com.google.protobuf.GeneratedMessage$Builder.parseUnknownField(GeneratedMessage.java:438)
>> ****
>>
>>         at
>> org.apache.hadoop.hive.ql.io.orc.OrcProto$PostScript$Builder.mergeFrom(OrcProto.java:10129)
>> ****
>>
>>         at
>> org.apache.hadoop.hive.ql.io.orc.OrcProto$PostScript$Builder.mergeFrom(OrcProto.java:9993)
>> ****
>>
>>         at
>> com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:300)
>> ****
>>
>>         at
>> org.apache.hadoop.hive.ql.io.orc.OrcProto$PostScript.parseFrom(OrcProto.java:9970)
>> ****
>>
>>         at
>> org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:193)**
>> **
>>
>>         at
>> org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:56)***
>> *
>>
>>         at
>> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:168)
>> ****
>>
>>         at
>> org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:432)
>> ****
>>
>>         at
>> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:508)
>> ****
>>
>>  ****
>>
>> We did following steps that leads to above exception:****
>>
>> ·         SET mapred.output.compression.codec=
>> org.apache.hadoop.io.compress.SnappyCodec;****
>>
>> ·         CREATE TABLE person(id INT, name STRING) ROW FORMAT DELIMITED
>> FIELDS TERMINATED BY ' ' STORED AS ORC tblproperties
>> ("orc.compress"="Snappy");****
>>
>> ·         LOAD DATA LOCAL INPATH 'test.txt' INTO TABLE person;****
>>
>> ·         *Executing  :* SELECT * FROM person;
>> *Results :*****
>>
>> Failed with exception
>> java.io.IOException:com.google.protobuf.InvalidProtocolBufferException:
>> While parsing a protocol message, the input ended unexpectedly in the
>> middle of a field.  This could mean either than the input has been
>> truncated or that an embedded message misreported its own length.****
>>
>>  ****
>>
>> Also, we included codec property in core-site.xml in our hadoop cluster
>> with other configuration settings.****
>>
>> <property>****
>>
>>      <name>io.compression.codecs</name>****
>>
>>     <value>org.apache.hadoop.io.compress.SnappyCodec</value>****
>>
>> </property>****
>>
>>  ****
>>
>> Following are the new jars with their placements ****
>>
>>  ****
>>
>> 1.       Placed a new jar at $HIVE_HOME/lib/config-1.0.0.jar****
>>
>> 2.       Placed a new jar for metastore connection
>> $HIVE_HOME/lib/mysql-connector-java-5.1.17-bin.jar****
>>
>> 3.       Moved jackson-core-asl-1.8.8.jar from $HIVE_HOME/lib to
>> $HADOOP_HOME/lib****
>>
>> 4.       Moved jackson-mapper-asl-1.8.8.jar from $HIVE_HOME/lib to
>> $HADOOP_HOME/lib****
>>
>>  ****
>>
>> Please suggest the possible cause and solution to overcome this issue we
>> are facing with ORC format tables.****
>>
>>  ****
>>
>> Thanks,****
>>
>> Keshav****
>>
>>  ****
>>
>> _____________
>> The information contained in this message is proprietary and/or
>> confidential. If you are not the intended recipient, please: (i) delete the
>> message and all copies; (ii) do not disclose, distribute or use the message
>> in any manner; and (iii) notify the sender immediately. In addition, please
>> be aware that any message addressed to our domain is subject to archiving
>> and review by persons other than the intended recipient. Thank you.****
>>
>>
>>
>> ****
>>
>> ** **
>>
>> --
>> Nitin Pawar****
>>   _____________
>> The information contained in this message is proprietary and/or
>> confidential. If you are not the intended recipient, please: (i) delete the
>> message and all copies; (ii) do not disclose, distribute or use the message
>> in any manner; and (iii) notify the sender immediately. In addition, please
>> be aware that any message addressed to our domain is subject to archiving
>> and review by persons other than the intended recipient. Thank you.
>>
>
>
>
> --
> Nitin Pawar
>

Reply via email to