Prasanth, Thank you for the help. It would not have occurred to me to look at partition sort and order issues from that dump. I may just apply the patch to my copy of 13.
Regards, Bryan Jeffrey On Apr 22, 2014 2:41 PM, "Prasanth Jayachandran" < [email protected]> wrote: > Bryan, > > This issue is related to https://issues.apache.org/jira/browse/HIVE-6883 > > The workaround for this issue is to disable > hive.optimize.sort.dynamic.partition optimization by setting it to false. > > We found this issue very late (towards the end of 0.13 release) and so > wasn’t included in hive 0.13. It will go into the next patch release/next > release. I will request for a backport to hive 0.13 source as well. > > Thanks > Prasanth Jayachandran > > On Apr 22, 2014, at 10:36 AM, Bryan Jeffrey <[email protected]> > wrote: > > Prasanth, > > Was this additional information sufficient? This is a large road block to > our adopting Hive 0.13.0. > > Regards, > > Bryan Jeffrey > > > On Tue, Apr 22, 2014 at 7:41 AM, Bryan Jeffrey <[email protected]>wrote: > >> Prasanth, >> >> The error seems to occur with just about any table. I mocked up a very >> simple table to illustrate the problem (including input data, etc.) to make >> this easy to repeat. >> >> hive> create table loading_data_0 (A smallint, B smallint) partitioned by >> (range int) row format delimited fields terminated by '|' stored as >> textfile; >> hive> create table data (A smallint, B smallint) partitioned by (range >> int) clustered by (A) sorted by (A, B) into 8 buckets stored as orc >> tblproperties (\"orc.compress\" = \"SNAPPY\", \"orc.index\" = \"true\"); >> [root@server ~]# cat test.input >> 123|436 >> 423|426 >> 223|456 >> 923|486 >> 023|406 >> hive> load data inpath '/test.input' into table loading_data_0 partition >> (range=123); >> >> [root@server scripts]# hive -e "describe data;" >> Logging initialized using configuration in >> /opt/hadoop/latest-hive/conf/hive.log4j >> OK >> Time taken: 0.508 seconds >> OK >> a smallint >> b smallint >> range int >> >> # Partition Information >> # col_name data_type comment >> >> range int >> Time taken: 0.422 seconds, Fetched: 8 row(s) >> [root@server scripts]# hive -e "describe loading_data_0;" >> Logging initialized using configuration in >> /opt/hadoop/latest-hive/conf/hive.log4j >> OK >> Time taken: 0.511 seconds >> OK >> a smallint >> b smallint >> range int >> >> # Partition Information >> # col_name data_type comment >> >> range int >> Time taken: 0.37 seconds, Fetched: 8 row(s) >> >> >> [root@server scripts]# hive -e "set >> hive.exec.dynamic.partition.mode=nonstrict; set hive.enforce.sorting = >> true; set mapred.job.queue.name=orc_queue; explain insert into table >> data partition (range) select * from loading_data_0;" >> Logging initialized using configuration in >> /opt/hadoop/latest-hive/conf/hive.log4j >> OK >> Time taken: 0.564 seconds >> OK >> STAGE DEPENDENCIES: >> Stage-1 is a root stage >> Stage-0 depends on stages: Stage-1 >> >> STAGE PLANS: >> Stage: Stage-1 >> Map Reduce >> Map Operator Tree: >> TableScan >> alias: loading_data_0 >> Statistics: Num rows: 5 Data size: 40 Basic stats: COMPLETE >> Column stats: NONE >> Select Operator >> expressions: a (type: smallint), b (type: smallint), range >> (type: int) >> outputColumnNames: _col0, _col1, _col2 >> Statistics: Num rows: 5 Data size: 40 Basic stats: COMPLETE >> Column stats: NONE >> Reduce Output Operator >> key expressions: _col2 (type: int), -1 (type: int), _col0 >> (type: smallint), _col1 (type: smallint) >> sort order: ++++ >> Map-reduce partition columns: _col2 (type: int) >> Statistics: Num rows: 5 Data size: 40 Basic stats: >> COMPLETE Column stats: NONE >> value expressions: _col0 (type: smallint), _col1 (type: >> smallint), _col2 (type: int) >> Reduce Operator Tree: >> Extract >> Statistics: Num rows: 5 Data size: 40 Basic stats: COMPLETE >> Column stats: NONE >> File Output Operator >> compressed: false >> Statistics: Num rows: 5 Data size: 40 Basic stats: COMPLETE >> Column stats: NONE >> table: >> input format: >> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat >> output format: >> org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat >> serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde >> name: data >> >> Stage: Stage-0 >> Move Operator >> tables: >> partition: >> range >> replace: false >> table: >> input format: >> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat >> output format: >> org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat >> serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde >> name: data >> >> Time taken: 0.913 seconds, Fetched: 45 row(s) >> >> >> >> [root@server]# hive -e "set hive.exec.dynamic.partition.mode=nonstrict; >> set hive.enforce.sorting = true; set mapred.job.queue.name=orc_queue; >> insert into table data partition (range) select * from loading_data_0;" >> Logging initialized using configuration in >> /opt/hadoop/latest-hive/conf/hive.log4j >> OK >> Time taken: 0.513 seconds >> Total jobs = 1 >> Launching Job 1 out of 1 >> Number of reduce tasks not specified. Estimated from input data size: 1 >> In order to change the average load for a reducer (in bytes): >> set hive.exec.reducers.bytes.per.reducer=<number> >> In order to limit the maximum number of reducers: >> set hive.exec.reducers.max=<number> >> In order to set a constant number of reducers: >> set mapreduce.job.reduces=<number> >> Starting Job = job_1398130933303_1467, Tracking URL = >> http://server:8088/proxy/application_1398130933303_1467/ >> Kill Command = /opt/hadoop/latest-hadoop/bin/hadoop job -kill >> job_1398130933303_1467 >> Hadoop job information for Stage-1: number of mappers: 1; number of >> reducers: 1 >> 2014-04-22 11:33:26,984 Stage-1 map = 0%, reduce = 0% >> 2014-04-22 11:33:51,833 Stage-1 map = 100%, reduce = 100% >> Ended Job = job_1398130933303_1467 with errors >> Error during job, obtaining debugging information... >> Examining task ID: task_1398130933303_1467_m_000000 (and more) from job >> job_1398130933303_1467 >> >> Task with the most failures(4): >> ----- >> Task ID: >> task_1398130933303_1467_m_000000 >> >> URL: >> >> http://server:8088/taskdetails.jsp?jobid=job_1398130933303_1467&tipid=task_1398130933303_1467_m_000000 >> ----- >> Diagnostic Messages for this Task: >> Error: java.lang.RuntimeException: >> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while >> processing row {"a":123,"b":436,"range":123} >> at >> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:195) >> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) >> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) >> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:396) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) >> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) >> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime >> Error while processing row {"a":123,"b":436,"range":123} >> at >> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550) >> at >> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177) >> ... 8 more >> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: >> java.lang.IndexOutOfBoundsException: Index: 3, Size: 3 >> at >> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:327) >> at >> org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) >> at >> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87) >> at >> org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) >> at >> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) >> at >> org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) >> at >> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540) >> ... 9 more >> Caused by: java.lang.IndexOutOfBoundsException: Index: 3, Size: 3 >> at java.util.ArrayList.RangeCheck(ArrayList.java:547) >> at java.util.ArrayList.get(ArrayList.java:322) >> at >> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.init(StandardStructObjectInspector.java:121) >> at >> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.<init>(StandardStructObjectInspector.java:109) >> at >> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:283) >> at >> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:268) >> at >> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.initEvaluatorsAndReturnStruct(ReduceSinkOperator.java:251) >> at >> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:264) >> ... 15 more >> >> Container killed by the ApplicationMaster. >> Container killed on request. Exit code is 143 >> Container exited with a non-zero exit code 143 >> >> >> FAILED: Execution Error, return code 2 from >> org.apache.hadoop.hive.ql.exec.mr.MapRedTask >> MapReduce Jobs Launched: >> Job 0: Map: 1 Reduce: 1 HDFS Read: 0 HDFS Write: 0 FAIL >> Total MapReduce CPU Time Spent: 0 msec >> >> Does that help? I took a quick look at ReduceSinkOperator, but was >> unable to put my finger on the issue. >> >> Regards, >> >> Bryan Jeffrey >> >> >> >> On Mon, Apr 21, 2014 at 10:55 PM, Prasanth Jayachandran < >> [email protected]> wrote: >> >>> Hi Bryan >>> >>> Can you provide more information about the input and output tables? >>> Schema? Partitioning and bucketing information? Explain plan of your insert >>> query? >>> >>> These information will help to diagnose the issue. >>> >>> Thanks >>> Prasanth >>> >>> Sent from my iPhone >>> >>> > On Apr 21, 2014, at 7:00 PM, Bryan Jeffrey <[email protected]> >>> wrote: >>> > >>> > Hello. >>> > >>> > I am running Hadoop 2.4.0 and Hive 0.13.0. I am encountering the >>> following error when converting a text table to ORC via the following >>> command: >>> > >>> > Error: >>> > >>> > Diagnostic Messages for this Task: >>> > Error: java.lang.RuntimeException: >>> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while >>> processing row { - Removed -} >>> > at >>> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:195) >>> > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) >>> > at >>> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) >>> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) >>> > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) >>> > at java.security.AccessController.doPrivileged(Native Method) >>> > at javax.security.auth.Subject.doAs(Subject.java:396) >>> > at >>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) >>> > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) >>> > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive >>> Runtime Error while processing row { - Removed -} >>> > at >>> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550) >>> > at >>> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177) >>> > ... 8 more >>> > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: >>> java.lang.IndexOutOfBoundsException: Index: 3, Size: 3 >>> > at >>> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:327) >>> > at >>> org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) >>> > at >>> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87) >>> > at >>> org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) >>> > at >>> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) >>> > at >>> org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) >>> > at >>> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540) >>> > ... 9 more >>> > Caused by: java.lang.IndexOutOfBoundsException: Index: 3, Size: 3 >>> > at java.util.ArrayList.RangeCheck(ArrayList.java:547) >>> > at java.util.ArrayList.get(ArrayList.java:322) >>> > at >>> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.init(StandardStructObjectInspector.java:121) >>> > at >>> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.<init>(StandardStructObjectInspector.java:109) >>> > at >>> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:283) >>> > at >>> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:268) >>> > at >>> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.initEvaluatorsAndReturnStruct(ReduceSinkOperator.java:251) >>> > at >>> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:264) >>> > ... 15 more >>> > >>> > Container killed by the ApplicationMaster. >>> > Container killed on request. Exit code is 143 >>> > Container exited with a non-zero exit code 143 >>> > >>> > There are a number of older issues associated with IndexOutOfBounds >>> errors within the serde, but nothing that appears to specifically match >>> this error. This occurs with all tables (including those consisting of >>> exclusively integers). Any thoughts? >>> > >>> > Regards, >>> > >>> > Bryan Jeffrey >>> >>> -- >>> CONFIDENTIALITY NOTICE >>> NOTICE: This message is intended for the use of the individual or entity >>> to >>> which it is addressed and may contain information that is confidential, >>> privileged and exempt from disclosure under applicable law. If the reader >>> of this message is not the intended recipient, you are hereby notified >>> that >>> any printing, copying, dissemination, distribution, disclosure or >>> forwarding of this communication is strictly prohibited. If you have >>> received this communication in error, please contact the sender >>> immediately >>> and delete it from your system. Thank You. >>> >> >> > > > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity > to which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You.
