Re: Time Difference between the Nth record and Nth-1 Record

Gianmarco De Francisci Morales Wed, 08 Oct 2014 23:14:38 -0700

I guess one way to do this is to use RANK twice, once on the original
relationship, and once on the original relationship \ the first point. Then
join on the rank and subtract.


A = load 'data';
B = filter A by timestamp > 20141014120523; -- remove the first point
C= RANK A by timestamp;
D= RANK B by timestamp;
E = JOIN C by $0; D by $0; -- join on the rank
F = foreach E generate C.timestamp - D.timestamp'


Disclaimer: the script is just off the top of my head and is not tested.

Cheers,

--
Gianmarco

On 8 October 2014 09:01, Krishna Kalyan <[email protected]> wrote:

> Hi Everybody,
>
> Input File : Records are sorted based on the time stamp
> Expected input file size will be :2-3TB
>
> timestamp
> ==============
> 20141014120523
> 20141014120534
> 20141014120537
> 20141014120542
> 20141014120549
> 20141014120555
> 20141014120565
> 20141014120570
> 20141014120512
> ...
> ...
>
>
> Using PIG I need to find the time difference between the Nth record and
> Nth-1 Record time stamp (20141014120534 - 20141014120523 = 11 secs).
> I need to loop through all the records to get the time difference from
> previous record
>
> Example Output
> 0
> 11
> 3
> 5
> ...
>
> Please guide.
>
> Regards,
> Krishna Kalyan
>

Re: Time Difference between the Nth record and Nth-1 Record

Reply via email to