Hive 1.2.1 on Spark1.4.1 2015-12-22 19:31 GMT+08:00 Jone Zhang <joyoungzh...@gmail.com>:
> *select * from staff;* > 1 jone 22 1 > 2 lucy 21 1 > 3 hmm 22 2 > 4 james 24 3 > 5 xiaoliu 23 3 > > *select id,date_ from trade union all select id,"test" from trade ;* > 1 201510210908 > 2 201509080234 > 2 201509080235 > 1 test > 2 test > 2 test > > *set hive.execution.engine=spark;* > *set spark.master=local;* > *select /*+mapjoin(t)*/ * from staff s join * > *(select id,date_ from trade union all select id,"test" from trade ) t on > s.id <http://s.id>=t.id <http://t.id>;* > 1 jone 22 1 1 201510210908 > 2 lucy 21 1 2 201509080234 > 2 lucy 21 1 2 201509080235 > > *set hive.execution.engine=mr;* > *select /*+mapjoin(t)*/ * from staff s join * > *(select id,date_ from trade union all select id,"test" from trade ) t on > s.id <http://s.id>=t.id <http://t.id>;* > FAILED: SemanticException [Error 10227]: Not all clauses are supported > with mapjoin hint. Please remove mapjoin hint. > > *I have two questions* > *1.Why result of hive on spark not include the following record?* > 1 jone 22 1 1 test > 2 lucy 21 1 2 test > 2 lucy 21 1 2 test > > *2.Why there are two different ways of dealing same query?* > > > *explain 1:* > *set hive.execution.engine=spark;* > *set spark.master=local;* > *explain * > *select id,date_ from trade union all select id,"test" from trade;* > OK > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > > STAGE PLANS: > Stage: Stage-1 > Spark > DagName: > jonezhang_20151222191643_5301d90a-caf0-4934-8092-d165c87a4190:1 > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: trade > Statistics: Num rows: 6 Data size: 48 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: id (type: int), date_ (type: string) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 6 Data size: 48 Basic stats: > COMPLETE Column stats: NONE > File Output Operator > compressed: false > Statistics: Num rows: 12 Data size: 96 Basic stats: > COMPLETE Column stats: NONE > table: > input format: > org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > Map 2 > Map Operator Tree: > TableScan > alias: trade > Statistics: Num rows: 6 Data size: 48 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: id (type: int), 'test' (type: string) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 6 Data size: 48 Basic stats: > COMPLETE Column stats: NONE > File Output Operator > compressed: false > Statistics: Num rows: 12 Data size: 96 Basic stats: > COMPLETE Column stats: NONE > table: > input format: > org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > > Stage: Stage-0 > Fetch Operator > limit: -1 > Processor Tree: > ListSink > > > *explain 2:* > *set hive.execution.engine=spark;* > *set spark.master=local;* > *explain * > *select /*+mapjoin(t)*/ * from staff s join * > *(select id,date_ from trade union all select id,"20999999999" from trade > ) t on s.id <http://s.id>=t.id <http://t.id>;* > OK > STAGE DEPENDENCIES: > Stage-2 is a root stage > Stage-1 depends on stages: Stage-2 > Stage-0 depends on stages: Stage-1 > > STAGE PLANS: > Stage: Stage-2 > Spark > DagName: > jonezhang_20151222191716_be7eac84-b5b6-4478-b88f-9f59e2b1b1a8:3 > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: trade > Statistics: Num rows: 6 Data size: 48 Basic stats: > COMPLETE Column stats: NONE > Filter Operator > predicate: id is not null (type: boolean) > Statistics: Num rows: 3 Data size: 24 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: id (type: int), date_ (type: string) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 3 Data size: 24 Basic stats: > COMPLETE Column stats: NONE > Spark HashTable Sink Operator > keys: > 0 id (type: int) > 1 _col0 (type: int) > Local Work: > Map Reduce Local Work > > Stage: Stage-1 > Spark > DagName: > jonezhang_20151222191716_be7eac84-b5b6-4478-b88f-9f59e2b1b1a8:2 > Vertices: > Map 2 > Map Operator Tree: > TableScan > alias: s > Statistics: Num rows: 1 Data size: 66 Basic stats: > COMPLETE Column stats: NONE > Filter Operator > predicate: id is not null (type: boolean) > Statistics: Num rows: 1 Data size: 66 Basic stats: > COMPLETE Column stats: NONE > Map Join Operator > condition map: > Inner Join 0 to 1 > keys: > 0 id (type: int) > 1 _col0 (type: int) > outputColumnNames: _col0, _col1, _col2, _col3, > _col7, _col8 > input vertices: > 1 Map 1 > Statistics: Num rows: 6 Data size: 52 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: _col0 (type: int), _col1 (type: > string), _col2 (type: int), _col3 (type: int), _col7 (type: int), _col8 > (type: string) > outputColumnNames: _col0, _col1, _col2, _col3, > _col4, _col5 > Statistics: Num rows: 6 Data size: 52 Basic stats: > COMPLETE Column stats: NONE > File Output Operator > compressed: false > Statistics: Num rows: 6 Data size: 52 Basic > stats: COMPLETE Column stats: NONE > table: > input format: > org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > Local Work: > Map Reduce Local Work > > Stage: Stage-0 > Fetch Operator > limit: -1 > Processor Tree: > ListSink > > > > *I can't find any information about union "test" in explain 2.* > > *Some properties on hive-site.xml is * > <property> > <name>hive.ignore.mapjoin.hint</name> > <value>false</value> > </property> > <property> > <name>hive.auto.convert.join</name> > <value>true</value> > </property> > <property> > <name>hive.auto.convert.join.noconditionaltask</name> > <value>true</value> > > > > > *Thanks.* > *Best wishes.* >