I am trying that solution. Currently I am running my query to see what result I am getting back with UDF.
*Raihan Jamal* On Tue, Jul 10, 2012 at 12:13 AM, Nitin Pawar <[email protected]>wrote: > i thought you managed to solve this with rank?? > > > On Tue, Jul 10, 2012 at 12:38 PM, Raihan Jamal <[email protected]>wrote: > >> Problem with that approach is, with LIMIT 10, If I am putting after desc, >> then it will get only 10 rows irrespective of BUYER_ID. But I need >> specifically for each BUYER_ID 10 latest rows. >> >> >> >> >> *Raihan Jamal* >> >> >> >> On Tue, Jul 10, 2012 at 12:03 AM, Abhishek Tiwari < >> [email protected]> wrote: >> >>> Raihan, >>> >>> Andes suggests you use 'limit' after 'desc' ie Hive with sort your query >>> results in descending order and then chop off any more than first 10 >>> records. >>> However, Hive will still run the full scan on data since it has to find >>> the sorted list of records for you. >>> >>> >>> On Tue, Jul 10, 2012 at 8:36 AM, Raihan Jamal <[email protected]>wrote: >>> >>>> This is my below requirement. I need- *Find `TOP 10` data for each >>>> `BUYER_ID, *So I cannot use LIMIT 10 here in this case. >>>> >>>> This is the below table >>>> >>>> CREATE TABLE IF NOT EXISTS TestingTable1 >>>> ( >>>> BUYER_ID BIGINT, >>>> ITEM_ID BIGINT, >>>> CREATED_TIME STRING >>>> ) >>>> >>>> And this is the below data in the above table- >>>> >>>> BUYER_ID | ITEM_ID | CREATED_TIME >>>> ------------+------------------+----------------------- >>>> 1015826235 220003038067 2012-07-09 19:40:21, >>>> 1015826235 300003861266 2012-07-09 18:19:59, >>>> 1015826235 140002997245 2012-07-09 09:23:17, >>>> 1015826235 210002448035 2012-07-09 22:21:11, >>>> 1015826235 260003553381 2012-07-09 07:09:56, >>>> 1015826235 260003553382 2012-07-09 19:40:39, >>>> 1015826235 260003553383 2012-07-09 06:58:47, >>>> 1015826235 260003553384 2012-07-09 07:28:47, >>>> 1015826235 260003553385 2012-07-09 08:48:47, >>>> 1015826235 260003553386 2012-07-09 06:38:47, >>>> 1015826235 260003553387 2012-07-09 05:38:47, >>>> 1015826235 260003553388 2012-07-09 04:55:47, >>>> 1015826235 260003553389 2012-07-09 06:54:37, >>>> 34512201 597245693 2012-07-09 16:20:21, >>>> 34512201 8071787728 2012-07-09 15:19:59, >>>> 34512201 5868222883 2012-07-09 08:23:17, >>>> 34512201 2412180494 2012-07-09 22:21:11, >>>> 34512201 2422054205 2012-07-09 06:09:56, >>>> 34512201 1875744030 2012-07-09 19:40:39, >>>> 34512201 5639158173 2012-07-09 06:58:47, >>>> 34512201 5656232360 2012-07-09 07:28:47, >>>> 34512201 959188449 2012-07-09 08:48:47, >>>> 34512201 4645350592 2012-07-09 06:38:47, >>>> 34512201 5657320532 2012-07-09 05:38:47, >>>> 34512201 290419656539 2012-07-09 04:55:47, >>>> >>>> If you see the above data in the table, there are only two UNIQUE >>>> `BUYER_ID` and corresponding to those I have `ITEM_ID` AND `CREATED_TIME`. >>>> I need only 10 latest record basis on the time for that given day for each >>>> `BUYER_ID`. >>>> >>>> So for this `BUYER_ID` - `34512201` I need 10 latest record basis on >>>> `CREATED_TIME` for that given day only, it means for today's date I need 10 >>>> latest record for each `BUYER_ID`. >>>> >>>> And each `BUYER_ID` can have any day's data. But I am specifically >>>> interested for day before today's data(means yesterday's date always) by >>>> checking at the `CREATED_TIME` >>>> >>>> **Find `TOP 10` data for each `BUYER_ID`. Below is the sample output.** >>>> >>>> Sample Output. >>>> >>>> BUYER_ID | ITEM_ID | CREATED_TIME >>>> ------------+------------------+----------------------- >>>> 34512201 2412180494 2012-07-09 22:21:11 >>>> 34512201 1875744030 2012-07-09 19:40:39 >>>> 34512201 597245693 2012-07-09 16:20:21 >>>> 34512201 8071787728 2012-07-09 15:19:59 >>>> 34512201 959188449 2012-07-09 08:48:47 >>>> 34512201 5868222883 2012-07-09 08:23:17 >>>> 34512201 5656232360 2012-07-09 07:28:47 >>>> 34512201 5639158173 2012-07-09 06:58:47 >>>> 34512201 4645350592 2012-07-09 06:38:47 >>>> 34512201 2422054205 2012-07-09 06:09:56 >>>> 1015826235 210002448035 2012-07-09 22:21:11 >>>> 1015826235 260003553382 2012-07-09 19:40:39 >>>> 1015826235 220003038067 2012-07-09 19:40:21 >>>> 1015826235 300003861266 2012-07-09 18:19:59 >>>> 1015826235 140002997245 2012-07-09 09:23:17 >>>> 1015826235 260003553385 2012-07-09 08:48:47 >>>> 1015826235 260003553384 2012-07-09 07:28:47 >>>> 1015826235 260003553381 2012-07-09 07:09:56 >>>> 1015826235 260003553383 2012-07-09 06:58:47 >>>> 1015826235 260003553389 2012-07-09 06:54:37 >>>> >>>> >>>> *Raihan Jamal* >>>> >>>> >>>> >>>> On Mon, Jul 9, 2012 at 7:56 PM, Andes <[email protected]> wrote: >>>> >>>>> ** >>>>> hello, you can use "desc" and "limit 10" to filter the top 10. >>>>> >>>>> 2012-07-10 >>>>> ------------------------------ >>>>> ** >>>>> Best Regards >>>>> Andes >>>>> >>>>> ** >>>>> ------------------------------ >>>>> *发件人:*Raihan Jamal >>>>> *发送时间:*2012-07-10 10:31 >>>>> *主题:*Find TOP 10 using HiveQL >>>>> *收件人:*"user"<[email protected]> >>>>> *抄送:* >>>>> >>>>> When I run this query, >>>>> >>>>> SELECT TOP 10 FROM TestingTable1 WHERE ORDER BY buyer_id, created_time >>>>> DESC; >>>>> >>>>> >>>>> I always get error as- >>>>> >>>>> *FAILED: Parse Error: line 1:7 cannot recognize input 'TOP' in >>>>> select expression* >>>>> >>>>> Is there any way around to use TOP 10 or something similar that will >>>>> work in HiveQL? >>>>> >>>>> >>>>> >>>>> *Raihan Jamal* >>>>> >>>>> >>>> >>> >>> >>> >>> >>> >> > > > -- > Nitin Pawar > >
