Hi, You arte looking at the top 25 of result set so you will have to get full result set before looking at top 25
Something like this select rs.prod_id, rs.score from ( prod_id, count(prod_id) AS Score from sales GROUP BY prod_id ORDER BY Score DESC )rs LIMIT 25; HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 7 March 2016 at 17:04, Awhan Patnaik <aw...@spotzot.com> wrote: > I have to take the first 25 IDs ranked by count(*). But the following is > not allowed in Hive > > select id from T order by count(*) desc limit 25; > > Which yields a "NOt yet supported place for UDAF count". The way around it > is the following > > select id, count(*) as cnt from T group by id order by cnt desc limit 25; > > However I need to put this query in a subquery like so > > select id, XXX from T t join .... where t.id in (select id, count(*) as > cnt from T group by id order by cnt desc limit 25) group by ... ; > > which does NOT work because a subquery is allowed to return only one > thing. Here XXX are complex constructs like distance calculation and > binning. These are time consuming and complex operations. > > The only way I see this would work is if I use the following sub-sub query > > select id, XXX from T t join ... where t.id in (select sub.id from > (select id, count(*) as cnt from T group by id order by cnt desc limit > 25)sub) group by ... ; > > The reason I don't want to put the limit in the outermost query is because > those XXX queries are expensive and I don't want them to be performed on > the entire result only to retain the top 25 and throw away the rest of the > results. The count(*) operation of the rest of the IDs is not very > expensive or time consuming. > > Is there some other elegant way of handling this without using a > sub-sub-query approach? >