Praveen, My apologies--I meant to suggest a streaming function because a UDF would not be able to hold state either. Look at the documentation for TRANSFORM ( http://wiki.apache.org/hadoop/Hive/LanguageManual/Transform ). Your transformation script can be used to compare timestamps from successive lines of your data. Tim
On Sat, Jun 11, 2011 at 12:32 PM, Praveen <[email protected]> wrote: > Do you mean that my UDF would store the timestamp of the current row in a > static field in the UDF's implementation, and when processing the next row, > use that field to get the previous row's value ? > > Can anyone comment on whether that's safe, re: I'm not familiar with Hive > internals ? > > Thanks, > > pk > > Sent from my iPhone > > On Jun 10, 2011, at 11:18 PM, Tim Spence <[email protected]> > wrote: > > Praveen, > This would be best accomplished with a UDF because Hive does not support > cursors. > Best of luck, > Tim > > > > > On Fri, Jun 10, 2011 at 10:29 PM, Praveen Kumar < <[email protected]> > [email protected]> wrote: > >> If I have table timestamps: >> >> hive> desc timestamps; >> >> OK >> ts bigint >> >> >> hive> select ts from timestamps order by ts >> OK >> >> 1 >> 2 >> 3 >> 4 >> 5 >> 6 >> 7 >> 8 >> 9 >> 10 >> 30 >> 32 >> 34 >> 36 >> 38 >> 40 >> 42 >> 44 >> 46 >> 48 >> 50 >> 70 >> 74 >> 78 >> 100 >> 105 >> 110 >> 115 >> >> and I want to make groups of the values where splits between groups >> occur where two time-consecutive entries have a difference greater >> than 10. >> >> Eg, above, the splits would be such that the numbers would be grouped >> into these ranges: >> >> 0-10 >> 30-50 >> 70-78 >> 100-115 >> >> because (30 - 10), (70 - 50), and (100 - 78) are each greater than 10. >> >> I'd like the query to result in the following: >> >> hive> select ... >> >> 0 7 >> 0 9 >> 0 6 >> 0 3 >> 0 10 >> 0 1 >> 0 4 >> 0 5 >> 0 8 >> 0 2 >> 30 34 >> 30 44 >> 30 40 >> 30 38 >> 30 36 >> 30 32 >> 30 46 >> 30 42 >> 30 48 >> 30 50 >> 30 30 >> 70 74 >> 70 70 >> 70 78 >> 100 100 >> 100 105 >> 100 110 >> 100 115 >> >> What is the most efficient hive query that will do this ? Thanks, >> >> pk >> > >
