Hi,


I have been developing a solution that works with Hive. Here is a simple e.g.



from <select county, tract, arealand from geo_header_sf1 where sumlev = 140>

partition by county

order by county, arealand desc

with

  rank() as r,

  sum(arealand) over rows between unbounded preceding and current row as 
cum_area

select county, tract, arealand, r, cum_area where <r <= 3>

into path='/tmp/wout'



We have loaded the Census data into our Hive instance. The Query is calculating 
the Top 3 Tracts(based on land area) by County.



Windowing Queries can be expressed on any Hive Query or Table or on an HDFS 
file. Windowing processing works in MR mode

or in Hive mode. MR mode generates a MR job to evaluate the windowing 
functions. In Hive Mode, the Windowing process is

invoked from the Hive Script Operator. Support for 16 functions divided into: 
Ranking, Aggregation and navigation.

Support for both value and range boundaries for Windows. Easy to extend through 
groovy integration. Code is available

at https://github.com/hbutani/SQLWindowing. A writeup is available at 
https://github.com/hbutani/SQLWindowing/blob/master/docs/Windowing.pdf?raw=true



Please send me an email if you are interested in using this.



regards,

Harish Butani

Reply via email to