I am trying that solution. Currently I am running my query to see what
result I am getting back with UDF.




*Raihan Jamal*



On Tue, Jul 10, 2012 at 12:13 AM, Nitin Pawar <[email protected]>wrote:

> i thought you managed to solve this with rank??
>
>
> On Tue, Jul 10, 2012 at 12:38 PM, Raihan Jamal <[email protected]>wrote:
>
>> Problem with that approach is, with LIMIT 10, If I am putting after desc,
>> then it will get only 10 rows irrespective of BUYER_ID. But I need
>> specifically for each BUYER_ID 10 latest rows.
>>
>>
>>
>>
>> *Raihan Jamal*
>>
>>
>>
>> On Tue, Jul 10, 2012 at 12:03 AM, Abhishek Tiwari <
>> [email protected]> wrote:
>>
>>> Raihan,
>>>
>>> Andes suggests you use 'limit' after 'desc' ie Hive with sort your query
>>> results in descending order and then chop off any more than first 10
>>> records.
>>> However, Hive will still run the full scan on data since it has to find
>>> the sorted list of records for you.
>>>
>>>
>>> On Tue, Jul 10, 2012 at 8:36 AM, Raihan Jamal <[email protected]>wrote:
>>>
>>>> This is my below requirement. I need-  *Find `TOP 10` data for each
>>>> `BUYER_ID, *So I cannot use LIMIT 10 here in this case.
>>>>
>>>> This is the below table
>>>>
>>>>     CREATE TABLE IF NOT EXISTS TestingTable1
>>>>     (
>>>>     BUYER_ID BIGINT,
>>>>     ITEM_ID BIGINT,
>>>>     CREATED_TIME STRING
>>>>     )
>>>>
>>>> And this is the below data in the above table-
>>>>
>>>>     BUYER_ID    |    ITEM_ID   | CREATED_TIME
>>>>     ------------+------------------+-----------------------
>>>>     1015826235      220003038067        2012-07-09 19:40:21,
>>>>     1015826235      300003861266        2012-07-09 18:19:59,
>>>>     1015826235      140002997245        2012-07-09 09:23:17,
>>>>     1015826235      210002448035        2012-07-09 22:21:11,
>>>>     1015826235      260003553381        2012-07-09 07:09:56,
>>>>     1015826235      260003553382        2012-07-09 19:40:39,
>>>>     1015826235      260003553383        2012-07-09 06:58:47,
>>>>     1015826235      260003553384        2012-07-09 07:28:47,
>>>>     1015826235      260003553385        2012-07-09 08:48:47,
>>>>     1015826235      260003553386        2012-07-09 06:38:47,
>>>>     1015826235      260003553387        2012-07-09 05:38:47,
>>>>     1015826235      260003553388        2012-07-09 04:55:47,
>>>>     1015826235      260003553389        2012-07-09 06:54:37,
>>>>     34512201        597245693           2012-07-09 16:20:21,
>>>>     34512201        8071787728          2012-07-09 15:19:59,
>>>>     34512201        5868222883          2012-07-09 08:23:17,
>>>>     34512201        2412180494          2012-07-09 22:21:11,
>>>>     34512201        2422054205          2012-07-09 06:09:56,
>>>>     34512201        1875744030          2012-07-09 19:40:39,
>>>>     34512201        5639158173          2012-07-09 06:58:47,
>>>>     34512201        5656232360          2012-07-09 07:28:47,
>>>>     34512201        959188449 2012-07-09 08:48:47,
>>>>     34512201        4645350592          2012-07-09 06:38:47,
>>>>     34512201        5657320532 2012-07-09 05:38:47,
>>>>     34512201        290419656539 2012-07-09 04:55:47,
>>>>
>>>> If you see the above data in the table, there are only two UNIQUE
>>>> `BUYER_ID` and corresponding to those I have `ITEM_ID` AND `CREATED_TIME`.
>>>> I need only 10 latest record basis on the time for that given day for each
>>>> `BUYER_ID`.
>>>>
>>>> So for this `BUYER_ID` - `34512201` I need 10 latest record basis on
>>>> `CREATED_TIME` for that given day only, it means for today's date I need 10
>>>> latest record for each `BUYER_ID`.
>>>>
>>>> And each `BUYER_ID` can have any day's data. But I am specifically
>>>> interested for day before  today's data(means yesterday's date always) by
>>>> checking at the `CREATED_TIME`
>>>>
>>>> **Find `TOP 10` data for each `BUYER_ID`. Below is the sample output.**
>>>>
>>>> Sample Output.
>>>>
>>>>     BUYER_ID    |    ITEM_ID   | CREATED_TIME
>>>>     ------------+------------------+-----------------------
>>>>     34512201        2412180494          2012-07-09 22:21:11
>>>>     34512201        1875744030          2012-07-09 19:40:39
>>>>     34512201        597245693           2012-07-09 16:20:21
>>>>     34512201        8071787728          2012-07-09 15:19:59
>>>>     34512201        959188449 2012-07-09 08:48:47
>>>>     34512201        5868222883          2012-07-09 08:23:17
>>>>     34512201        5656232360          2012-07-09 07:28:47
>>>>     34512201        5639158173          2012-07-09 06:58:47
>>>>     34512201        4645350592          2012-07-09 06:38:47
>>>>     34512201        2422054205          2012-07-09 06:09:56
>>>>     1015826235      210002448035        2012-07-09 22:21:11
>>>>     1015826235      260003553382        2012-07-09 19:40:39
>>>>     1015826235      220003038067        2012-07-09 19:40:21
>>>>     1015826235      300003861266        2012-07-09 18:19:59
>>>>     1015826235      140002997245        2012-07-09 09:23:17
>>>>     1015826235      260003553385        2012-07-09 08:48:47
>>>>     1015826235      260003553384        2012-07-09 07:28:47
>>>>     1015826235      260003553381        2012-07-09 07:09:56
>>>>     1015826235      260003553383        2012-07-09 06:58:47
>>>>     1015826235      260003553389        2012-07-09 06:54:37
>>>>
>>>>
>>>> *Raihan Jamal*
>>>>
>>>>
>>>>
>>>> On Mon, Jul 9, 2012 at 7:56 PM, Andes <[email protected]> wrote:
>>>>
>>>>> **
>>>>> hello, you can use "desc" and "limit 10" to filter the top 10.
>>>>>
>>>>> 2012-07-10
>>>>>  ------------------------------
>>>>>  **
>>>>> Best Regards
>>>>> Andes
>>>>>
>>>>> **
>>>>>  ------------------------------
>>>>>  *发件人:*Raihan Jamal
>>>>> *发送时间:*2012-07-10 10:31
>>>>> *主题:*Find TOP 10 using HiveQL
>>>>> *收件人:*"user"<[email protected]>
>>>>> *抄送:*
>>>>>
>>>>>  When I run this query,
>>>>>
>>>>> SELECT TOP 10 FROM TestingTable1 WHERE ORDER BY buyer_id, created_time
>>>>> DESC;
>>>>>
>>>>>
>>>>> I always get error as-
>>>>>
>>>>>  *FAILED: Parse Error: line 1:7 cannot recognize input 'TOP' in
>>>>> select expression*
>>>>>
>>>>> Is there any way around to use TOP 10 or something similar that will
>>>>> work in HiveQL?
>>>>>
>>>>>
>>>>>
>>>>> *Raihan Jamal*
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>>
>>>
>>
>
>
> --
> Nitin Pawar
>
>

Reply via email to