hi, I use the multiple inserts method, and I write an sql like this:
from td INSERT OVERWRITE DIRECTORY '/tmp/total.out' select count(v1) INSERT OVERWRITE DIRECTORY '/tmp/totaldistinct.out' select count(distinct v1) INSERT OVERWRITE DIRECTORY '/tmp/distinctuin.out' select distinct v1 INSERT OVERWRITE DIRECTORY '/tmp/v4.out' select v4 , count(v1), count(distinct v1) group by v4 INSERT OVERWRITE DIRECTORY '/tmp/v3v4.out' select v3, v4 , count(v1), count(distinct v1) group by v3, v4 INSERT OVERWRITE DIRECTORY '/tmp/v426.out' select count(v1), count(distinct v1) where v4=2 or v4=6 INSERT OVERWRITE DIRECTORY '/tmp/v3v426.out' select v3, count(v1), count(distinct v1) where v4=2 or v4=6 group by v3 INSERT OVERWRITE DIRECTORY '/tmp/v415.out' select count(v1), count(distinct v1) where v4=1 or v4=5 INSERT OVERWRITE DIRECTORY '/tmp/v3v415.out' select v3, count(v1), count(distinct v1) where v4=1 or v4=5 group by v3 it works, and the output result is what I want. but there is one problem, hive generate 9 mapreduce jobs and run these jobs one by one. I run explain on this query, and I got the following message: STAGE DEPENDENCIES: Stage-9 is a root stage Stage-0 depends on stages: Stage-9 Stage-10 depends on stages: Stage-9 Stage-1 depends on stages: Stage-10 Stage-11 depends on stages: Stage-9 Stage-2 depends on stages: Stage-11 Stage-12 depends on stages: Stage-9 Stage-3 depends on stages: Stage-12 Stage-13 depends on stages: Stage-9 Stage-4 depends on stages: Stage-13 Stage-14 depends on stages: Stage-9 Stage-5 depends on stages: Stage-14 Stage-15 depends on stages: Stage-9 Stage-6 depends on stages: Stage-15 Stage-16 depends on stages: Stage-9 Stage-7 depends on stages: Stage-16 Stage-17 depends on stages: Stage-9 Stage-8 depends on stages: Stage-17 it seems that stage 9-17 is corresponding to mapreduce job 0-8 but from the explain message above, stage 10-17 only depends on stage 9, so I have an question, why job 1-8 can't run concurrently? Or how can I make job 1-8 run concurrently? thank you very much for your help again! xiaobin 在 2012年1月13日 下午12:01,Xiaobin She <xiaobin...@gmail.com>写道: > to Martin, Mark and Edward, > > thank you for your advices, I will try it out. > > And to Martin, by "appropriate data format", do you mean something like " > 2012011202" ? > > thanks! > > xiaobin > > 在 2012年1月12日 下午10:20,Martin Kuhn <martin.k...@affinitas.de>写道: > > Hi there, >> >> > Select count(*), count(distinct u), type from t group by type where >> plat=1 and dt=”2012-1-12-02” >> > Select count(*), count(distinct u), type from t where (type =2 or type >> =6) and dt=”2012-1-12-02” group by type; >> >> > Is there a better way to do these queries? >> >> You could try something like this: >> >> SELECT >> type >> , count(*) >> , count(DISTINCT u) >> , count(CASE WHEN plat=1 THEN u ELSE NULL) >> , count(DISTINCT CASE WHEN plat=1 THEN u ELSE NULL) >> , count(CASE WHEN (type=2 OR type=6) THEN u ELSE NULL) >> , count(DISTINCT CASE WHEN (type=2 OR type=6) THEN u ELSE NULL) >> FROM >> t >> WHERE >> dt in ("2012-1-12-02", "2012-1-12-03") >> GROUP BY >> type >> ORDER BY >> type >> ; >> >> Good luck :) >> Martin Kuhn >> >> >> P.S. You'ge got a strange date format there. For sorting purposes it >> would be more appropriate to use something like "2012-01-12-02". >> >> >