Hi, I am trying to run a simple join query on hive 13.
Both tables are in text format. Both tables are read in mappers, and the error is thrown in reducer. I don't get why a reducer is reading a table when the mappers have read it already and the reason for assuming that the video file is in SequenceFile format. Below, you can find query, query plan, and the error. Any help is greatly appreciated. Thanks, Sid Hadoop Version: 2.0.0-mr1 Query: SELECT computerguid FROM revenue_start_adeffx_v2 JOIN video ON revenue_start_adeffx_v2.video_id = video.video_id WHERE hourid = '389567'; Query Plan: STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: revenue_start_adeffx_v2 Statistics: Num rows: 3175840 Data size: 330287403 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: video_id (type: int) sort order: + Map-reduce partition columns: video_id (type: int) Statistics: Num rows: 3175840 Data size: 330287403 Basic stats: COMPLETE Column stats: NONE value expressions: computerguid (type: string) TableScan alias: video Statistics: Num rows: 146679792 Data size: 586719168 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: video_id (type: int) sort order: + Map-reduce partition columns: video_id (type: int) Statistics: Num rows: 146679792 Data size: 586719168 Basic stats: COMPLETE Column stats: NONE Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {VALUE._col0} 1 outputColumnNames: _col0 Statistics: Num rows: 161347776 Data size: 645391104 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col0 (type: string) outputColumnNames: _col0 Statistics: Num rows: 161347776 Data size: 645391104 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 161347776 Data size: 645391104 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0 Fetch Operator limit: -1 Error: 2014-06-11 10:18:34,818 FATAL ExecReducer: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: hdfs://<NN><Path>/video/video_20140611051139 not a SequenceFile at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:237) at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:644) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758) at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: java.io.IOException: hdfs:/<NN><Path>/hive/warehouse/video/video_20140611051139 not a SequenceFile at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1805) at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1765) at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1714) at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1728) at org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:43) at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:59) at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:226) ... 12 more 2014-06-11 10:18:34,822 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1 2014-06-11 10:18:34,824 WARN org.apache.hadoop.mapred.Child: Error running child java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: hdfs://<NN><Path>/video_20140611051139 not a SequenceFile at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:283) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: hdfs://<NN><Path>/video/video_20140611051139 not a SequenceFile at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:237) at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:644) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758) at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216) ... 7 more Caused by: java.io.IOException: hdfs://<NN><Path>/video/video_20140611051139 not a SequenceFile at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1805) at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1765) at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1714) at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1728) at org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:43) at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:59) at org.apache.