Hi Xiaobin, There is functionality like (from: https://cwiki.apache.org/Hive/languagemanual-dml.html): Hive extension (multiple inserts): FROM from_statement INSERT OVERWRITE [LOCAL] DIRECTORY directory1 select_statement1 [INSERT OVERWRITE [LOCAL] DIRECTORY directory2 select_statement2] ...
This will do one pass through your table's partitions but group and filter them according to different criteria. As you become more proficient with Hive, you could consider creating an index on some columns or bucketing on those columns (for optimization in joins, for example) if you end up filtering/grouping/joining on them over and over again. Good luck! Mark Grover, Business Intelligence Analyst OANDA Corporation www: oanda.com www: fxtrade.com e: mgro...@oanda.com "Best Trading Platform" - World Finance's Forex Awards 2009. "The One to Watch" - Treasury Today's Adam Smith Awards 2009. ----- Original Message ----- From: "Martin Kuhn" <martin.k...@affinitas.de> To: user@hive.apache.org Sent: Thursday, January 12, 2012 9:20:00 AM Subject: Re: how to avoid scan the same table multi times? Hi there, > Select count(*), count(distinct u), type from t group by type where plat=1 > and dt=”2012-1-12-02” > Select count(*), count(distinct u), type from t where (type =2 or type =6) > and dt=”2012-1-12-02” group by type; > Is there a better way to do these queries? You could try something like this: SELECT type , count(*) , count(DISTINCT u) , count(CASE WHEN plat=1 THEN u ELSE NULL) , count(DISTINCT CASE WHEN plat=1 THEN u ELSE NULL) , count(CASE WHEN (type=2 OR type=6) THEN u ELSE NULL) , count(DISTINCT CASE WHEN (type=2 OR type=6) THEN u ELSE NULL) FROM t WHERE dt in ("2012-1-12-02", "2012-1-12-03") GROUP BY type ORDER BY type ; Good luck :) Martin Kuhn P.S. You'ge got a strange date format there. For sorting purposes it would be more appropriate to use something like "2012-01-12-02".