Makes sense but is not the distribution across nodes for a chunk of records in 
that order. 

If Hive cannot help me do this, is there another way I can do this? I tried 
generating an identifier using the perl script invoked using Hive but it does 
not seem to work fine. While the stand alone script works fine, when the record 
is created in hive using std output from perl - I see 2 records for some of the 
unique identifiers. I explored the possibility of default data type changes but 
that does not solve the problem. 

Regards,
Karan


-----Original Message-----
From: Philip Tromans [mailto:philip.j.trom...@gmail.com] 
Sent: 10 April 2012 19:48
To: user@hive.apache.org
Subject: Re: Lag function in Hive

Hi Karan,

To the best of my knowledge, there isn't one. It's also unlikely to
happen because it's hard to parallelise in a map-reduce way (it
requires knowing where you are in a result set, and who your
neighbours are and they in turn need to be present on the same node as
you which is difficult to guarantee).

Cheers,

Phil.

On 10 April 2012 14:44,  <karanveer.si...@barclays.com> wrote:
> Hi,
>
> Is there something like a 'lag' function in HIVE? The requirement is to
> calculate difference for the same column for every 2 subsequent records.
>
> For example.
>
> Row, Column A, Column B
> 1, 10, 100
> 2, 20, 200
> 3, 30, 300
>
>
> The result that I need should be like:
>
> Row, Column A, Column B, Result
> 1, 10, 100, NULL
> 2, 20, 200, 100 (200-100)
> 3, 30, 300, 100 (300-200)
>
> Rgds,
> Karan
>
>
>
>
>
> This e-mail and any attachments are confidential and intended solely for the
> addressee and may also be privileged or exempt from disclosure under
> applicable law. If you are not the addressee, or have received this e-mail
> in error, please notify the sender immediately, delete it from your system
> and do not copy, disclose or otherwise act upon any part of this e-mail or
> its attachments.
>
> Internet communications are not guaranteed to be secure or virus-free.
> The Barclays Group does not accept responsibility for any loss arising from
> unauthorised access to, or interference with, any Internet communications by
> any third party, or from the transmission of any viruses. Replies to this
> e-mail may be monitored by the Barclays Group for operational or business
> reasons.
>
> Any opinion or other information in this e-mail or its attachments that does
> not relate to the business of the Barclays Group is personal to the sender
> and is not given or endorsed by the Barclays Group.
>
> Barclays Bank PLC.Registered in England and Wales (registered no. 1026167).
> Registered Office: 1 Churchill Place, London, E14 5HP, United Kingdom.
>
> Barclays Bank PLC is authorised and regulated by the Financial Services
> Authority.

Reply via email to