We just use the Java Map class, with the restriction that the key must be a String. There are some helper methods in trunk to work with maps, and you can you # to dereference ie map#'key'
2012/6/15 Mohit Anchlia <mohitanch...@gmail.com> > On Fri, Jun 15, 2012 at 9:12 AM, Alan Gates <ga...@hortonworks.com> wrote: > > > This seems reasonable, except it seems like it would make more sense to > > convert query parameters to maps. By definition a query parameter is > > key=value. And a map is easier to work with in general then a bag, since > > there's no need to flatten them. > > > > I've never used them. Is this Map format in hadoop? > > > > Alan. > > > > On Jun 11, 2012, at 10:55 AM, Mohit Anchlia wrote: > > > > > I am looking at how to parse URL with query parameters to process > > > clickstream data. Are there any examples I can look at? My steps that I > > > envision are: > > > > > > 1) Read lines and convert query parameters into bags that is a group of > > > fields for a particular dimension table. So if Geo is one of the > > dimensions > > > group all the geo related information from that URL as a Bag. > > > In the end it would like like {{92122,CA},{Unix,FireFox}}. In this > > example > > > first bag is GEO dimension and the second is Browser dimension. > > > 2) Load these into OLAP staging database > > > 3) Populate star schema from staging tables > > > > > > I am sure other people might already be doing this so I thought I'll > > check > > > as to if this makes sense. > > > > >