*I have run a query many times, there will be two results without regular.*
*One is 36834699 and other is 18464706.*
*The query is *
set spark.yarn.queue=soft.high;
set hive.execution.engine=spark;
select /*+mapjoin(t3,t4,t5)*/
count(1)
from
(
select
coalesce(t11.qua,t12.qua,t13.qua) qua,
coalesce(t11.scene,t12.lanmu_id,t13.lanmu_id) scene,
coalesce(t11.app_id,t12.appid,t13.app_id) app_id,
expos_pv,
expos_uv,
dload_pv,
dload_uv,
dload_cnt,
dload_user,
evil_dload_cnt,
evil_dload_user,
update_dcnt,
update_duser,
hand_suc_incnt,
hand_suc_inuser,
day_hand_suc_incnt,
day_hand_suc_inuser
from
(select * from t_ed_soft_assist_useraction_stat where ds=20160126)t11
full outer join
(select * from t_md_soft_lanmu_app_dload_detail where ds=20160126)t12
on t11.qua=t12.qua and t11.app_id=t12.appid and t11.scene=t12.lanmu_id
full outer join
(select * from t_md_soft_client_install_lanmu where ds=20160126)t13
on t11.qua=t13.qua and t11.app_id=t13.app_id and t11.scene=t13.lanmu_id
)t1
left outer join t_rd_qua t3 on t3.ds=20160126 and t1.qua=t3.qua
left outer join t_rd_soft_appnew_last t4 on t4.ds=20160126 and
t1.app_id=t4.app_id
left outer join t_rd_soft_page_conf t5 on t5.ds=20160126 and
t1.scene=t5.pageid and t3.client_type_id=t5.ismtt;
*Explain query is*
STAGE DEPENDENCIES:
Stage-2 is a root stage
Stage-1 depends on stages: Stage-2
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-2
Spark
DagName: mqq_20160127151826_e8197f40-18d7-430c-9fc8-993facb74534:2
Vertices:
Map 6
Map Operator Tree:
TableScan
alias: t3
Statistics: Num rows: 1051 Data size: 113569 Basic stats:
COMPLETE Column stats: NONE
Spark HashTable Sink Operator
keys:
0 _col0 (type: string)
1 qua (type: string)
Local Work:
Map Reduce Local Work
Map 7
Map Operator Tree:
TableScan
alias: t4
Statistics: Num rows: 2542751 Data size: 220433659 Basic
stats: COMPLETE Column stats: NONE
Spark HashTable Sink Operator
keys:
0 UDFToDouble(_col2) (type: double)
1 UDFToDouble(app_id) (type: double)
Local Work:
Map Reduce Local Work
Map 8
Map Operator Tree:
TableScan
alias: t5
Statistics: Num rows: 143 Data size: 28605 Basic stats:
COMPLETE Column stats: NONE
Spark HashTable Sink Operator
keys:
0 _col1 (type: string), UDFToDouble(_col20) (type:
double)
1 pageid (type: string), UDFToDouble(ismtt) (type:
double)
Local Work:
Map Reduce Local Work
Stage: Stage-1
Spark
Edges:
Reducer 2 <- Map 1 (PARTITION-LEVEL SORT, 5), Map 4
(PARTITION-LEVEL SORT, 5), Map 5 (PARTITION-LEVEL SORT, 5)
Reducer 3 <- Reducer 2 (GROUP, 1)
DagName: mqq_20160127151826_e8197f40-18d7-430c-9fc8-993facb74534:1
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: t_ed_soft_assist_useraction_stat
Statistics: Num rows: 16368107 Data size: 651461220 Basic
stats: COMPLETE Column stats: NONE
Select Operator
expressions: qua (type: string), scene (type: string),
app_id (type: string)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 16368107 Data size: 651461220
Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: string),
UDFToDouble(_col2) (type: double), _col1 (type: string)
sort order: +++
Map-reduce partition columns: _col0 (type: string),
UDFToDouble(_col2) (type: double), _col1 (type: string)
Statistics: Num rows: 16368107 Data size: 651461220
Basic stats: COMPLETE Column stats: NONE
value expressions: _col2 (type: string)
Map 4
Map Operator Tree:
TableScan
alias: t_md_soft_lanmu_app_dload_detail
Statistics: Num rows: 2503976 Data size: 203324640 Basic
stats: COMPLETE Column stats: NONE
Select Operator
expressions: qua (type: string), appid (type: bigint),
lanmu_id (type: string)
outputColumnNames: _col2, _col3, _col4
Statistics: Num rows: 2503976 Data size: 203324640
Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col2 (type: string),
UDFToDouble(_col3) (type: double), _col4 (type: string)
sort order: +++
Map-reduce partition columns: _col2 (type: string),
UDFToDouble(_col3) (type: double), _col4 (type: string)
Statistics: Num rows: 2503976 Data size: 203324640
Basic stats: COMPLETE Column stats: NONE
value expressions: _col3 (type: bigint)
Map 5
Map Operator Tree:
TableScan
alias: t_md_soft_client_install_lanmu
Statistics: Num rows: 2143605 Data size: 178502520 Basic
stats: COMPLETE Column stats: NONE
Select Operator
expressions: lanmu_id (type: string), qua (type:
string), app_id (type: int)
outputColumnNames: _col13, _col2, _col3
Statistics: Num rows: 2143605 Data size: 178502520
Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col2 (type: string),
UDFToDouble(_col3) (type: double), _col13 (type: string)
sort order: +++
Map-reduce partition columns: _col2 (type: string),
UDFToDouble(_col3) (type: double), _col13 (type: string)
Statistics: Num rows: 2143605 Data size: 178502520
Basic stats: COMPLETE Column stats: NONE
value expressions: _col3 (type: int)
Reducer 2
Local Work:
Map Reduce Local Work
Reduce Operator Tree:
Join Operator
condition map:
Outer Join 0 to 1
Outer Join 0 to 2
keys:
0 _col0 (type: string), UDFToDouble(_col2) (type:
double), _col1 (type: string)
1 _col2 (type: string), UDFToDouble(_col3) (type:
double), _col4 (type: string)
2 _col2 (type: string), UDFToDouble(_col3) (type:
double), _col13 (type: string)
outputColumnNames: _col0, _col1, _col2, _col10, _col11,
_col12, _col26, _col27, _col37
Statistics: Num rows: 36009836 Data size: 1433214715 Basic
stats: COMPLETE Column stats: NONE
Select Operator
expressions: COALESCE(_col0,_col10,_col26) (type:
string), COALESCE(_col1,_col12,_col37) (type: string),
COALESCE(_col2,_col11,_col27) (type: string)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 36009836 Data size: 1433214715
Basic stats: COMPLETE Column stats: NONE
Map Join Operator
condition map:
Left Outer Join0 to 1
keys:
0 _col0 (type: string)
1 qua (type: string)
outputColumnNames: _col1, _col2, _col20
input vertices:
1 Map 6
Statistics: Num rows: 39610820 Data size: 1576536220
Basic stats: COMPLETE Column stats: NONE
Map Join Operator
condition map:
Left Outer Join0 to 1
keys:
0 UDFToDouble(_col2) (type: double)
1 UDFToDouble(app_id) (type: double)
outputColumnNames: _col1, _col20
input vertices:
1 Map 7
Statistics: Num rows: 43571902 Data size: 1734189879
Basic stats: COMPLETE Column stats: NONE
Map Join Operator
condition map:
Left Outer Join0 to 1
keys:
0 _col1 (type: string), UDFToDouble(_col20)
(type: double)
1 pageid (type: string), UDFToDouble(ismtt)
(type: double)
input vertices:
1 Map 8
Statistics: Num rows: 47929093 Data size:
1907608908 Basic stats: COMPLETE Column stats: NONE
Group By Operator
aggregations: count(1)
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 8 Basic stats:
COMPLETE Column stats: NONE
Reduce Output Operator
sort order:
Statistics: Num rows: 1 Data size: 8 Basic
stats: COMPLETE Column stats: NONE
value expressions: _col0 (type: bigint)
Reducer 3
Reduce Operator Tree:
Group By Operator
aggregations: count(VALUE._col0)
mode: mergepartial
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE
Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 8 Basic stats:
COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde:
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
Time taken: 3.723 seconds, Fetched: 173 row(s)
*Besh wishes.*
*Thank for any help.*