Hi,

You arte looking at the top 25 of result set so you will have to get full
result set before looking at top 25

Something like this

 select rs.prod_id, rs.score from
 (
   prod_id, count(prod_id) AS Score from sales GROUP BY prod_id ORDER BY
Score DESC
 )rs
 LIMIT 25;

HTH

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 7 March 2016 at 17:04, Awhan Patnaik <aw...@spotzot.com> wrote:

> I have to take the first 25 IDs ranked by count(*). But the following is
> not allowed in Hive
>
> select id from T order by count(*) desc limit 25;
>
> Which yields a "NOt yet supported place for UDAF count". The way around it
> is the following
>
> select id, count(*) as cnt from T group by id order by cnt desc limit 25;
>
> However I need to put this query in a subquery like so
>
> select id, XXX from T t join .... where t.id in (select id, count(*) as
> cnt from T group by id order by cnt desc limit 25) group by ... ;
>
> which does NOT work because a subquery is allowed to return only one
> thing. Here XXX are complex constructs like distance calculation and
> binning. These are time consuming and complex operations.
>
> The only way I see this would work is if I use the following sub-sub query
>
> select id, XXX from T t join ... where t.id in (select sub.id from
> (select id, count(*) as cnt from T group by id order by cnt desc limit
> 25)sub) group by ... ;
>
> The reason I don't want to put the limit in the outermost query is because
> those XXX queries are expensive and I don't want them to be performed on
> the entire result only to retain the top 25 and throw away the rest of the
> results. The count(*) operation of the rest of the IDs is not very
> expensive or time consuming.
>
> Is there some other elegant way of handling this without using a
> sub-sub-query approach?
>

Reply via email to