Yeah. I don't think my SQL would even be accepted because Hive QL doesn't allow the alternate join syntax in the WHERE clause. Thanks Phil.
On Apr 10, 2012, at 11:02 AM, Philip Tromans wrote: > I think you want something more like: > > SELECT t2.value - t1.value > FROM mytable t1 > JOIN mytable t2 ON (t1.rownum = t2.rownum + 1 AND t2.partition=bar) > WHERE t1.partition=foo; > > This should be faster as partition selection will happen earlier. > > This is still going to involve an awful lot of I/O, and not going to be fast. > > Phil. > > > On 10 April 2012 15:56, David Kulp <dk...@fiksu.com> wrote: >> You have to explicitly request it in CREATE TABLE. And you should generally >> let hive perform the clustering -- i.e. don't use an external table with >> data that is generated by some other process because it's hard to get the >> hash and notation right. >> Check your table with "DESCRIBE FORMATTED tablename". >> >> On Apr 10, 2012, at 10:51 AM, <karanveer.si...@barclays.com> >> <karanveer.si...@barclays.com> wrote: >> >>> Thanks - I will check this out. >>> >>> Meanwhile, would default clustering happen using rownum? How can I check on >>> how is clustering happening in our environment? >>> >>> Rgds >>> >>> ----- Original Message ----- >>> From: David Kulp <dk...@fiksu.com> >>> To: user@hive.apache.org <user@hive.apache.org> >>> Sent: Tue Apr 10 15:45:25 2012 >>> Subject: Re: Lag function in Hive >>> >>> New here. Hello all. >>> >>> Could you try a self-join, possibly also restricted to partitions? >>> >>> E.g. SELECT t2.value - t1.value FROM mytable t1, mytable t2 WHERE t1.rownum >>> = t2.rownum+1 AND t1.partition=foo AND t2.partition=bar >>> >>> If your data is clustered by rownum, then this join should, in theory, be >>> relatively fast -- especially if it makes sense to exploit partitions. >>> >>> -d >>> >>> On Apr 10, 2012, at 10:37 AM, <karanveer.si...@barclays.com> >>> <karanveer.si...@barclays.com> wrote: >>> >>>> Makes sense but is not the distribution across nodes for a chunk of >>>> records in that order. >>>> >>>> If Hive cannot help me do this, is there another way I can do this? I >>>> tried generating an identifier using the perl script invoked using Hive >>>> but it does not seem to work fine. While the stand alone script works >>>> fine, when the record is created in hive using std output from perl - I >>>> see 2 records for some of the unique identifiers. I explored the >>>> possibility of default data type changes but that does not solve the >>>> problem. >>>> >>>> Regards, >>>> Karan >>>> >>>> >>>> -----Original Message----- >>>> From: Philip Tromans [mailto:philip.j.trom...@gmail.com] >>>> Sent: 10 April 2012 19:48 >>>> To: user@hive.apache.org >>>> Subject: Re: Lag function in Hive >>>> >>>> Hi Karan, >>>> >>>> To the best of my knowledge, there isn't one. It's also unlikely to >>>> happen because it's hard to parallelise in a map-reduce way (it >>>> requires knowing where you are in a result set, and who your >>>> neighbours are and they in turn need to be present on the same node as >>>> you which is difficult to guarantee). >>>> >>>> Cheers, >>>> >>>> Phil. >>>> >>>> On 10 April 2012 14:44, <karanveer.si...@barclays.com> wrote: >>>>> Hi, >>>>> >>>>> Is there something like a 'lag' function in HIVE? The requirement is to >>>>> calculate difference for the same column for every 2 subsequent records. >>>>> >>>>> For example. >>>>> >>>>> Row, Column A, Column B >>>>> 1, 10, 100 >>>>> 2, 20, 200 >>>>> 3, 30, 300 >>>>> >>>>> >>>>> The result that I need should be like: >>>>> >>>>> Row, Column A, Column B, Result >>>>> 1, 10, 100, NULL >>>>> 2, 20, 200, 100 (200-100) >>>>> 3, 30, 300, 100 (300-200) >>>>> >>>>> Rgds, >>>>> Karan >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> This e-mail and any attachments are confidential and intended solely for >>>>> the >>>>> addressee and may also be privileged or exempt from disclosure under >>>>> applicable law. If you are not the addressee, or have received this e-mail >>>>> in error, please notify the sender immediately, delete it from your system >>>>> and do not copy, disclose or otherwise act upon any part of this e-mail or >>>>> its attachments. >>>>> >>>>> Internet communications are not guaranteed to be secure or virus-free. >>>>> The Barclays Group does not accept responsibility for any loss arising >>>>> from >>>>> unauthorised access to, or interference with, any Internet communications >>>>> by >>>>> any third party, or from the transmission of any viruses. Replies to this >>>>> e-mail may be monitored by the Barclays Group for operational or business >>>>> reasons. >>>>> >>>>> Any opinion or other information in this e-mail or its attachments that >>>>> does >>>>> not relate to the business of the Barclays Group is personal to the sender >>>>> and is not given or endorsed by the Barclays Group. >>>>> >>>>> Barclays Bank PLC.Registered in England and Wales (registered no. >>>>> 1026167). >>>>> Registered Office: 1 Churchill Place, London, E14 5HP, United Kingdom. >>>>> >>>>> Barclays Bank PLC is authorised and regulated by the Financial Services >>>>> Authority. >>> >>> >>> This e-mail and any attachments are confidential and intended >>> solely for the addressee and may also be privileged or exempt from >>> disclosure under applicable law. If you are not the addressee, or >>> have received this e-mail in error, please notify the sender >>> immediately, delete it from your system and do not copy, disclose >>> or otherwise act upon any part of this e-mail or its attachments. >>> >>> Internet communications are not guaranteed to be secure or >>> virus-free. >>> The Barclays Group does not accept responsibility for any loss >>> arising from unauthorised access to, or interference with, any >>> Internet communications by any third party, or from the >>> transmission of any viruses. Replies to this e-mail may be >>> monitored by the Barclays Group for operational or business >>> reasons. >>> >>> Any opinion or other information in this e-mail or its attachments >>> that does not relate to the business of the Barclays Group is >>> personal to the sender and is not given or endorsed by the Barclays >>> Group. >>> >>> Barclays Bank PLC. Registered in England and Wales (registered no. >>> 1026167). >>> Registered Office: 1 Churchill Place, London, E14 5HP, United >>> Kingdom. >>> >>> Barclays Bank PLC is authorised and regulated by the Financial >>> Services Authority. >>