Yeah.  I don't think my SQL would even be accepted because Hive QL doesn't 
allow the alternate join syntax in the WHERE clause.  Thanks Phil.

On Apr 10, 2012, at 11:02 AM, Philip Tromans wrote:

> I think you want something more like:
> 
> SELECT t2.value - t1.value
> FROM mytable t1
> JOIN mytable t2 ON (t1.rownum = t2.rownum + 1 AND t2.partition=bar)
> WHERE t1.partition=foo;
> 
> This should be faster as partition selection will happen earlier.
> 
> This is still going to involve an awful lot of I/O, and not going to be fast.
> 
> Phil.
> 
> 
> On 10 April 2012 15:56, David Kulp <dk...@fiksu.com> wrote:
>> You have to explicitly request it in CREATE TABLE.  And you should generally 
>> let hive perform the clustering -- i.e. don't use an external table with 
>> data that is generated by some other process because it's hard to get the 
>> hash and notation right.
>> Check your table with "DESCRIBE FORMATTED tablename".
>> 
>> On Apr 10, 2012, at 10:51 AM, <karanveer.si...@barclays.com> 
>> <karanveer.si...@barclays.com> wrote:
>> 
>>> Thanks - I will check this out.
>>> 
>>> Meanwhile, would default clustering happen using rownum? How can I check on 
>>> how is clustering happening in our environment?
>>> 
>>> Rgds
>>> 
>>> ----- Original Message -----
>>> From: David Kulp <dk...@fiksu.com>
>>> To: user@hive.apache.org <user@hive.apache.org>
>>> Sent: Tue Apr 10 15:45:25 2012
>>> Subject: Re: Lag function in Hive
>>> 
>>> New here.  Hello all.
>>> 
>>> Could you try a self-join, possibly also restricted to partitions?
>>> 
>>> E.g. SELECT t2.value - t1.value FROM mytable t1, mytable t2 WHERE t1.rownum 
>>> = t2.rownum+1 AND t1.partition=foo AND t2.partition=bar
>>> 
>>> If your data is clustered by rownum, then this join should, in theory, be 
>>> relatively fast -- especially if it makes sense to exploit partitions.
>>> 
>>> -d
>>> 
>>> On Apr 10, 2012, at 10:37 AM, <karanveer.si...@barclays.com> 
>>> <karanveer.si...@barclays.com> wrote:
>>> 
>>>> Makes sense but is not the distribution across nodes for a chunk of 
>>>> records in that order.
>>>> 
>>>> If Hive cannot help me do this, is there another way I can do this? I 
>>>> tried generating an identifier using the perl script invoked using Hive 
>>>> but it does not seem to work fine. While the stand alone script works 
>>>> fine, when the record is created in hive using std output from perl - I 
>>>> see 2 records for some of the unique identifiers. I explored the 
>>>> possibility of default data type changes but that does not solve the 
>>>> problem.
>>>> 
>>>> Regards,
>>>> Karan
>>>> 
>>>> 
>>>> -----Original Message-----
>>>> From: Philip Tromans [mailto:philip.j.trom...@gmail.com]
>>>> Sent: 10 April 2012 19:48
>>>> To: user@hive.apache.org
>>>> Subject: Re: Lag function in Hive
>>>> 
>>>> Hi Karan,
>>>> 
>>>> To the best of my knowledge, there isn't one. It's also unlikely to
>>>> happen because it's hard to parallelise in a map-reduce way (it
>>>> requires knowing where you are in a result set, and who your
>>>> neighbours are and they in turn need to be present on the same node as
>>>> you which is difficult to guarantee).
>>>> 
>>>> Cheers,
>>>> 
>>>> Phil.
>>>> 
>>>> On 10 April 2012 14:44,  <karanveer.si...@barclays.com> wrote:
>>>>> Hi,
>>>>> 
>>>>> Is there something like a 'lag' function in HIVE? The requirement is to
>>>>> calculate difference for the same column for every 2 subsequent records.
>>>>> 
>>>>> For example.
>>>>> 
>>>>> Row, Column A, Column B
>>>>> 1, 10, 100
>>>>> 2, 20, 200
>>>>> 3, 30, 300
>>>>> 
>>>>> 
>>>>> The result that I need should be like:
>>>>> 
>>>>> Row, Column A, Column B, Result
>>>>> 1, 10, 100, NULL
>>>>> 2, 20, 200, 100 (200-100)
>>>>> 3, 30, 300, 100 (300-200)
>>>>> 
>>>>> Rgds,
>>>>> Karan
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> This e-mail and any attachments are confidential and intended solely for 
>>>>> the
>>>>> addressee and may also be privileged or exempt from disclosure under
>>>>> applicable law. If you are not the addressee, or have received this e-mail
>>>>> in error, please notify the sender immediately, delete it from your system
>>>>> and do not copy, disclose or otherwise act upon any part of this e-mail or
>>>>> its attachments.
>>>>> 
>>>>> Internet communications are not guaranteed to be secure or virus-free.
>>>>> The Barclays Group does not accept responsibility for any loss arising 
>>>>> from
>>>>> unauthorised access to, or interference with, any Internet communications 
>>>>> by
>>>>> any third party, or from the transmission of any viruses. Replies to this
>>>>> e-mail may be monitored by the Barclays Group for operational or business
>>>>> reasons.
>>>>> 
>>>>> Any opinion or other information in this e-mail or its attachments that 
>>>>> does
>>>>> not relate to the business of the Barclays Group is personal to the sender
>>>>> and is not given or endorsed by the Barclays Group.
>>>>> 
>>>>> Barclays Bank PLC.Registered in England and Wales (registered no. 
>>>>> 1026167).
>>>>> Registered Office: 1 Churchill Place, London, E14 5HP, United Kingdom.
>>>>> 
>>>>> Barclays Bank PLC is authorised and regulated by the Financial Services
>>>>> Authority.
>>> 
>>> 
>>> This e-mail and any attachments are confidential and intended
>>> solely for the addressee and may also be privileged or exempt from
>>> disclosure under applicable law. If you are not the addressee, or
>>> have received this e-mail in error, please notify the sender
>>> immediately, delete it from your system and do not copy, disclose
>>> or otherwise act upon any part of this e-mail or its attachments.
>>> 
>>> Internet communications are not guaranteed to be secure or
>>> virus-free.
>>> The Barclays Group does not accept responsibility for any loss
>>> arising from unauthorised access to, or interference with, any
>>> Internet communications by any third party, or from the
>>> transmission of any viruses. Replies to this e-mail may be
>>> monitored by the Barclays Group for operational or business
>>> reasons.
>>> 
>>> Any opinion or other information in this e-mail or its attachments
>>> that does not relate to the business of the Barclays Group is
>>> personal to the sender and is not given or endorsed by the Barclays
>>> Group.
>>> 
>>> Barclays Bank PLC. Registered in England and Wales (registered no.
>>> 1026167).
>>> Registered Office: 1 Churchill Place, London, E14 5HP, United
>>> Kingdom.
>>> 
>>> Barclays Bank PLC is authorised and regulated by the Financial
>>> Services Authority.
>> 

Reply via email to