This is a little tricky but this is how it works: SELECT buyer_id, item_id, rank(item_id), created_time FROM ( SELECT buyer_id, item_id, created_time FROM testingtable1 DISTRIBUTE BY buyer_id, item_id SORT BY buyer_id, item_id, created_time desc ) a WHERE rank(item_id) < 10;
On Tue, Jul 10, 2012 at 12:31 PM, Raihan Jamal <jamalrai...@gmail.com> wrote: > Still it's not working with the use of my rank UDF. Below is the query I am > using > > Can anyone help me, what changes I need to make in my below sql query? > > > CREATE TABLE IF NOT EXISTS TestingTable1 > > > ( > > > BUYER_ID BIGINT, > > > ITEM_ID BIGINT, > > > CREATED_TIME STRING > ) > > > Find TOP 10 latest data (ITEM_ID, CREATED_TIME) for each BUYER_ID for > yesterday's date by sorting the created_time in descending order. > > > So what I was thinking is that, with the use of this below query, I will be > getting- > > > SELECT * FROM TestingTable1 WHERE ORDER BY buyer_id, created_time DESC; > > > All the BUYER_ID and its ITEM_ID corresponding to CREATED_TIME in Descending > order, But how I can pick TOP 10 for each BUYER_ID.? > > This is my SQL Query that I am using with the use of rank UDF. > > SELECT buyer_id, item_id, created_time, rk > FROM ( > SELECT buyer_id, item_id, rank(item_id) as rk, created_time > FROM testingtable1 > DISTRIBUTE BY buyer_id, item_id > SORT BY buyer_id, item_id, created_time desc > ) a > WHERE rk < 10 > ORDER BY buyer_id, created_time, rk; > > > > > > Raihan Jamal > > > > On Tue, Jul 10, 2012 at 12:16 AM, Jasper Knulst <jasper.knu...@incentro.com> > wrote: >> >> Hi Raihan, >> >> You should use 'rank(buyer_id)' in the order by clause on line 9 in stead >> of the alias 'rk'. I had the same problem, strangely, the alias is not >> resolved when it is in the order by clause. >> >> Other thing, I had some issues when I used this exact same set up for >> ranking results, that the rank UDF was already implemented at the map phase. >> Then you get very strange results. You have to introduce an extra subquery >> for it to work. >> >> Jasper >> >> Op 10 jul. 2012 09:01 schreef "Raihan Jamal" <jamalrai...@gmail.com> het >> volgende: >> >>> I was not able to understand, This is my below qiuery that I am using >>> currently- >>> >>> SELECT buyer_id, item_id, created_time >>> FROM ( >>> SELECT buyer_id, item_id, rank(buyer_id) as rank, created_time >>> FROM testingtable1 >>> DISTRIBUTE BY buyer_id, item_id >>> SORT BY buyer_id, item_id, created_time desc >>> ) a >>> WHERE rank < 10 >>> ORDER BY buyer_id, created_time, rank; >>> >>> >>> What changes I need to make? >>> >>> >>> >>> >>> Raihan Jamal >>> >>> >>> >>> On Mon, Jul 9, 2012 at 11:52 PM, Nitin Pawar <nitinpawar...@gmail.com> >>> wrote: >>>> >>>> try rk in upper select statement as well >>>> >>>> >>>> On Tue, Jul 10, 2012 at 12:12 PM, Raihan Jamal <jamalrai...@gmail.com> >>>> wrote: >>>>> >>>>> Thanks for commenting out. Yes I figured that out, its a UDF. So now I >>>>> have created a new UDF Rank and added to classpath also. But when I am >>>>> again >>>>> running the below query- >>>>> >>>>> SELECT buyer_id, item_id, created_time >>>>> FROM ( >>>>> SELECT buyer_id, item_id, Rank(buyer_id) as rk, created_time >>>>> FROM testingtable1 >>>>> DISTRIBUTE BY buyer_id, item_id >>>>> SORT BY buyer_id, item_id, created_time desc >>>>> ) a >>>>> WHERE rk < 10 >>>>> ORDER BY buyer_id, rk; >>>>> >>>>> I am getting this below error- >>>>> >>>>> FAILED: Error in semantic analysis: line 9:19 Invalid Table Alias or >>>>> Column Reference rk >>>>> >>>>> >>>>> Why is it so? Any suggestions? >>>>> >>>>> >>>>> Raihan Jamal >>>>> >>>>> >>>>> >>>>> On Mon, Jul 9, 2012 at 10:51 PM, Vijay <tec...@gmail.com> wrote: >>>>>> >>>>>> hive has no built-in rank function. you'd need to use a user-defined >>>>>> function (UDF) to simulate it. there are a few custom implementations >>>>>> on the net that you can leverage. >>>>>> >>>>>> On Mon, Jul 9, 2012 at 10:40 PM, Raihan Jamal <jamalrai...@gmail.com> >>>>>> wrote: >>>>>> > What's wrong with the below query. >>>>>> > >>>>>> > >>>>>> > SELECT buyer_id, item_id, created_time >>>>>> > FROM ( >>>>>> > SELECT buyer_id, item_id, rank(buyer_id) as rank, created_time >>>>>> > FROM testingtable1 >>>>>> > DISTRIBUTE BY buyer_id, item_id >>>>>> > SORT BY buyer_id, item_id, created_time desc >>>>>> > ) a >>>>>> > WHERE rank < 10 >>>>>> > ORDER BY buyer_id, rank >>>>>> > >>>>>> > >>>>>> > I am always getting as - >>>>>> > >>>>>> > FAILED: Error in semantic analysis: line 3:30 Invalid Function rank >>>>>> > >>>>>> > >>>>>> > I am using Hive 0.6 I guess. >>>>>> > >>>>>> > >>>>>> > >>>>>> > Raihan Jamal >>>>>> > >>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> Nitin Pawar >>>> >>> >