In that case, wouldn't this work:

SELECT buyer_id, item_id, rank(buyer_id), created_time
FROM (
    SELECT buyer_id, item_id, created_time
    FROM testingtable1
    DISTRIBUTE BY buyer_id
    SORT BY buyer_id, created_time desc
) a
WHERE rank(buyer_id) < 10;


On Tue, Jul 10, 2012 at 4:21 PM, Raihan Jamal <jamalrai...@gmail.com> wrote:
> So the OUTPUT should be like this everything sorted in descending order with
> time and only top 10 for each BUYER_ID-
>
> Sample Output.
>
> BUYER_ID    |    ITEM_ID       |    CREATED_TIME
>
>
> ------------+------------------+-----------------------
>
>
>
> 34512201        2412180494          2012-07-09 22:21:11
>
>
> 34512201        1875744030          2012-07-09 19:40:39
>
>
> 34512201        597245693           2012-07-09 16:20:21
>
>
> 34512201        8071787728          2012-07-09 15:19:59
>
>
> 34512201        959188449           2012-07-09 08:48:47
>
>
> 34512201        5868222883          2012-07-09 08:23:17
>
>
> 34512201        5656232360          2012-07-09 07:28:47
>
>
> 34512201        5639158173          2012-07-09 06:58:47
>
>
> 34512201        4645350592          2012-07-09 06:38:47
>
>
> 34512201        2422054205          2012-07-09 06:09:56
>
>
> 1015826235      210002448035        2012-07-09 22:21:11
>
>
> 1015826235      260003553382        2012-07-09 19:40:39
>
>
> 1015826235      220003038067        2012-07-09 19:40:21
>
>
> 1015826235      300003861266        2012-07-09 18:19:59
>
>
> 1015826235      140002997245        2012-07-09 09:23:17
>
>
> 1015826235      260003553385        2012-07-09 08:48:47
>
>
> 1015826235      260003553384        2012-07-09 07:28:47
>
>
> 1015826235      260003553381        2012-07-09 07:09:56
>
>
> 1015826235      260003553383        2012-07-09 06:58:47
>
>
> 1015826235      260003553389        2012-07-09 06:54:37
>
>
>
>
>
> Raihan Jamal
>
>
>
> On Tue, Jul 10, 2012 at 4:18 PM, Raihan Jamal <jamalrai...@gmail.com> wrote:
>>
>> Thanks Vijay for reply. But it doesn't works out the way I needed. I am
>> getting full data back for each BUYER_ID. Let me explain you more.
>>
>> This is the below data in the table and you can see in the below data this
>> BUYER_ID - 1015826235 appears 13 times and this BUYER_ID -34512201 appears
>> 12 times, And I need TOP 10 for each of the BUYER_ID BASIS on time-
>>
>> BUYER_ID    |    ITEM_ID       |    CREATED_TIME
>>
>>
>>
>> ------------+------------------+-----------------------
>>
>>
>>
>> 1015826235      220003038067        2012-07-09 19:40:21,
>>
>>
>>
>> 1015826235      300003861266        2012-07-09 18:19:59,
>>
>>
>>
>> 1015826235      140002997245        2012-07-09 09:23:17,
>>
>>
>>
>> 1015826235      210002448035        2012-07-09 22:21:11,
>>
>>
>>
>> 1015826235      260003553381        2012-07-09 07:09:56,
>>
>>
>>
>> 1015826235      260003553382        2012-07-09 19:40:39,
>>
>>
>>
>> 1015826235      260003553383        2012-07-09 06:58:47,
>>
>>
>>
>> 1015826235      260003553384        2012-07-09 07:28:47,
>>
>>
>>
>> 1015826235      260003553385        2012-07-09 08:48:47,
>>
>>
>>
>> 1015826235      260003553386        2012-07-09 06:38:47,
>>
>>
>>
>> 1015826235      260003553387        2012-07-09 05:38:47,
>>
>>
>>
>> 1015826235      260003553388        2012-07-09 04:55:47,
>>
>>
>>
>> 1015826235      260003553389        2012-07-09 06:54:37,
>>
>>
>>
>> 34512201        597245693           2012-07-09 16:20:21,
>>
>>
>>
>> 34512201        8071787728          2012-07-09 15:19:59,
>>
>>
>>
>> 34512201        5868222883          2012-07-09 08:23:17,
>>
>>
>>
>> 34512201        2412180494          2012-07-09 22:21:11,
>>
>>
>>
>> 34512201        2422054205          2012-07-09 06:09:56,
>>
>>
>>
>> 34512201        1875744030          2012-07-09 19:40:39,
>>
>>
>>
>> 34512201        5639158173          2012-07-09 06:58:47,
>>
>>
>>
>> 34512201        5656232360          2012-07-09 07:28:47,
>>
>>
>>
>> 34512201        959188449           2012-07-09 08:48:47,
>>
>>
>>
>> 34512201        4645350592          2012-07-09 06:38:47,
>>
>>
>>
>> 34512201        5657320532          2012-07-09 05:38:47,
>>
>>
>>
>> 34512201        290419656539        2012-07-09 04:55:47,
>>
>>
>> So basically if I do it like this-
>>
>> SELECT * FROM TestingTable1 ORDER BY buyer_id, created_time DESC;
>>
>> everything will get sorted in descending order for each BUYER_ID and then
>> I need to pick up TOP 10 rows for each BUYER_ID.
>>
>> And with your query that you just said, it's not working. I am getting
>> full data back like this. Below is the result I got from your query-
>>
>> 34512201        597245693       0       2012-07-09 16:20:21
>> 34512201        959188449       0       2012-07-09 08:48:47
>> 34512201        1875744030      0       2012-07-09 19:40:39
>> 34512201        2412180494      0       2012-07-09 22:21:11
>> 34512201        2422054205      0       2012-07-09 06:09:56
>> 34512201        4645350592      0       2012-07-09 06:38:47
>> 34512201        5639158173      0       2012-07-09 06:58:47
>> 34512201        5656232360      0       2012-07-09 07:28:47
>> 34512201        5657320532      0       2012-07-09 05:38:47
>> 34512201        5868222883      0       2012-07-09 08:23:17
>> 34512201        8071787728      0       2012-07-09 15:19:59
>> 34512201        290419656539    0       2012-07-09 04:55:47
>> 1015826235      140002997245    0       2012-07-09 09:23:17
>> 1015826235      210002448035    0       2012-07-09 22:21:11
>> 1015826235      220003038067    0       2012-07-09 19:40:21
>> 1015826235      260003553381    0       2012-07-09 07:09:56
>> 1015826235      260003553382    0       2012-07-09 19:40:39
>> 1015826235      260003553383    0       2012-07-09 06:58:47
>> 1015826235      260003553384    0       2012-07-09 07:28:47
>> 1015826235      260003553385    0       2012-07-09 08:48:47
>> 1015826235      260003553386    0       2012-07-09 06:38:47
>> 1015826235      260003553387    0       2012-07-09 05:38:47
>> 1015826235      260003553388    0       2012-07-09 04:55:47
>> 1015826235      260003553389    0       2012-07-09 06:54:37
>> 1015826235      300003861266    0       2012-07-09 18:19:59
>>
>>
>> This is my Rank function that I have created-
>>
>> package com.example.hive.udf;
>> import org.apache.hadoop.hive.ql.exec.UDF;
>>
>> public final class Rank extends UDF{
>>     private int  counter;
>>     private String last_key;
>>     public int evaluate(final String key){
>>  if ( !key.equalsIgnoreCase(this.last_key) ) {
>>     this.counter = 0;
>>     this.last_key = key;
>>  }
>>  return this.counter++;
>>     }
>> }
>>
>>
>> Raihan Jamal
>>
>>
>>
>> On Tue, Jul 10, 2012 at 4:01 PM, Vijay <tec...@gmail.com> wrote:
>>>
>>> This is a little tricky but this is how it works:
>>>
>>> SELECT buyer_id, item_id, rank(item_id), created_time
>>> FROM (
>>>     SELECT buyer_id, item_id, created_time
>>>     FROM testingtable1
>>>     DISTRIBUTE BY buyer_id, item_id
>>>     SORT BY buyer_id, item_id, created_time desc
>>> ) a
>>> WHERE rank(item_id) < 10;
>>>
>>>
>>> On Tue, Jul 10, 2012 at 12:31 PM, Raihan Jamal <jamalrai...@gmail.com>
>>> wrote:
>>> > Still it's not working with the use of my rank UDF. Below is the query
>>> > I am
>>> > using
>>> >
>>> > Can anyone help me, what changes I need to make in my below sql query?
>>> >
>>> >
>>> > CREATE TABLE IF NOT EXISTS TestingTable1
>>> >
>>> >
>>> > (
>>> >
>>> >
>>> > BUYER_ID BIGINT,
>>> >
>>> >
>>> > ITEM_ID BIGINT,
>>> >
>>> >
>>> > CREATED_TIME STRING
>>> > )
>>> >
>>> >
>>> > Find TOP 10 latest data (ITEM_ID, CREATED_TIME) for each BUYER_ID for
>>> > yesterday's date by sorting the created_time in descending order.
>>> >
>>> >
>>> > So what I was thinking is that, with the use of this below query, I
>>> > will be
>>> > getting-
>>> >
>>> >
>>> > SELECT * FROM TestingTable1 WHERE ORDER BY buyer_id, created_time DESC;
>>> >
>>> >
>>> > All the BUYER_ID and its ITEM_ID corresponding to CREATED_TIME in
>>> > Descending
>>> > order, But how I can pick TOP 10 for each BUYER_ID.?
>>> >
>>> > This is my SQL Query that I am using with the use of rank UDF.
>>> >
>>> > SELECT buyer_id, item_id, created_time, rk
>>> > FROM (
>>> >     SELECT buyer_id, item_id, rank(item_id) as rk, created_time
>>> >     FROM testingtable1
>>> >     DISTRIBUTE BY buyer_id, item_id
>>> >     SORT BY buyer_id, item_id, created_time desc
>>> > ) a
>>> > WHERE rk < 10
>>> > ORDER BY buyer_id, created_time, rk;
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > Raihan Jamal
>>> >
>>> >
>>> >
>>> > On Tue, Jul 10, 2012 at 12:16 AM, Jasper Knulst
>>> > <jasper.knu...@incentro.com>
>>> > wrote:
>>> >>
>>> >> Hi Raihan,
>>> >>
>>> >> You should use 'rank(buyer_id)' in the order by clause on line 9 in
>>> >> stead
>>> >> of the alias 'rk'. I had the same problem, strangely, the alias is not
>>> >> resolved when it is in the order by clause.
>>> >>
>>> >> Other thing, I had some issues when I used this exact same set up for
>>> >> ranking results, that the rank UDF was already implemented at the map
>>> >> phase.
>>> >> Then you get very strange results. You have to introduce an extra
>>> >> subquery
>>> >> for it to work.
>>> >>
>>> >> Jasper
>>> >>
>>> >> Op 10 jul. 2012 09:01 schreef "Raihan Jamal" <jamalrai...@gmail.com>
>>> >> het
>>> >> volgende:
>>> >>
>>> >>> I was not able to understand, This is my below qiuery that I am using
>>> >>> currently-
>>> >>>
>>> >>> SELECT buyer_id, item_id, created_time
>>> >>> FROM (
>>> >>>     SELECT buyer_id, item_id, rank(buyer_id) as rank, created_time
>>> >>>     FROM testingtable1
>>> >>>     DISTRIBUTE BY buyer_id, item_id
>>> >>>     SORT BY buyer_id, item_id, created_time desc
>>> >>> ) a
>>> >>> WHERE rank < 10
>>> >>> ORDER BY buyer_id, created_time, rank;
>>> >>>
>>> >>>
>>> >>> What changes I need to make?
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> Raihan Jamal
>>> >>>
>>> >>>
>>> >>>
>>> >>> On Mon, Jul 9, 2012 at 11:52 PM, Nitin Pawar
>>> >>> <nitinpawar...@gmail.com>
>>> >>> wrote:
>>> >>>>
>>> >>>> try rk in upper select statement as well
>>> >>>>
>>> >>>>
>>> >>>> On Tue, Jul 10, 2012 at 12:12 PM, Raihan Jamal
>>> >>>> <jamalrai...@gmail.com>
>>> >>>> wrote:
>>> >>>>>
>>> >>>>> Thanks for commenting out. Yes I figured that out, its a UDF. So
>>> >>>>> now I
>>> >>>>> have created a new UDF Rank and added to classpath also. But when I
>>> >>>>> am again
>>> >>>>> running the below query-
>>> >>>>>
>>> >>>>> SELECT buyer_id, item_id, created_time
>>> >>>>> FROM (
>>> >>>>>     SELECT buyer_id, item_id, Rank(buyer_id) as rk, created_time
>>> >>>>>     FROM testingtable1
>>> >>>>>     DISTRIBUTE BY buyer_id, item_id
>>> >>>>>     SORT BY buyer_id, item_id, created_time desc
>>> >>>>> ) a
>>> >>>>> WHERE rk < 10
>>> >>>>> ORDER BY buyer_id, rk;
>>> >>>>>
>>> >>>>> I am getting this below error-
>>> >>>>>
>>> >>>>> FAILED: Error in semantic analysis: line 9:19 Invalid Table Alias
>>> >>>>> or
>>> >>>>> Column Reference rk
>>> >>>>>
>>> >>>>>
>>> >>>>> Why is it so? Any suggestions?
>>> >>>>>
>>> >>>>>
>>> >>>>> Raihan Jamal
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> On Mon, Jul 9, 2012 at 10:51 PM, Vijay <tec...@gmail.com> wrote:
>>> >>>>>>
>>> >>>>>> hive has no built-in rank function. you'd need to use a
>>> >>>>>> user-defined
>>> >>>>>> function (UDF) to simulate it. there are a few custom
>>> >>>>>> implementations
>>> >>>>>> on the net that you can leverage.
>>> >>>>>>
>>> >>>>>> On Mon, Jul 9, 2012 at 10:40 PM, Raihan Jamal
>>> >>>>>> <jamalrai...@gmail.com>
>>> >>>>>> wrote:
>>> >>>>>> > What's wrong with the below query.
>>> >>>>>> >
>>> >>>>>> >
>>> >>>>>> > SELECT buyer_id, item_id, created_time
>>> >>>>>> > FROM (
>>> >>>>>> >     SELECT buyer_id, item_id, rank(buyer_id) as rank,
>>> >>>>>> > created_time
>>> >>>>>> >     FROM testingtable1
>>> >>>>>> >     DISTRIBUTE BY buyer_id, item_id
>>> >>>>>> >     SORT BY buyer_id, item_id, created_time desc
>>> >>>>>> > ) a
>>> >>>>>> > WHERE rank < 10
>>> >>>>>> > ORDER BY buyer_id, rank
>>> >>>>>> >
>>> >>>>>> >
>>> >>>>>> > I am always getting as -
>>> >>>>>> >
>>> >>>>>> > FAILED: Error in semantic analysis: line 3:30 Invalid Function
>>> >>>>>> > rank
>>> >>>>>> >
>>> >>>>>> >
>>> >>>>>> > I am using Hive 0.6 I guess.
>>> >>>>>> >
>>> >>>>>> >
>>> >>>>>> >
>>> >>>>>> > Raihan Jamal
>>> >>>>>> >
>>> >>>>>
>>> >>>>>
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> --
>>> >>>> Nitin Pawar
>>> >>>>
>>> >>>
>>> >
>>
>>
>

Reply via email to