On Fri, Jun 15, 2012 at 3:34 PM, Jonathan Coveney <jcove...@gmail.com>wrote:
> We just use the Java Map class, with the restriction that the key must be a > String. There are some helper methods in trunk to work with maps, and you > can you # to dereference ie map#'key' > thanks! If you don't mind could you please share once you flatten them do you then load it in the star schema in the database? I think I need to look at map > > 2012/6/15 Mohit Anchlia <mohitanch...@gmail.com> > > > On Fri, Jun 15, 2012 at 9:12 AM, Alan Gates <ga...@hortonworks.com> > wrote: > > > > > This seems reasonable, except it seems like it would make more sense to > > > convert query parameters to maps. By definition a query parameter is > > > key=value. And a map is easier to work with in general then a bag, > since > > > there's no need to flatten them. > > > > > > I've never used them. Is this Map format in hadoop? > > > > > > > Alan. > > > > > > On Jun 11, 2012, at 10:55 AM, Mohit Anchlia wrote: > > > > > > > I am looking at how to parse URL with query parameters to process > > > > clickstream data. Are there any examples I can look at? My steps > that I > > > > envision are: > > > > > > > > 1) Read lines and convert query parameters into bags that is a group > of > > > > fields for a particular dimension table. So if Geo is one of the > > > dimensions > > > > group all the geo related information from that URL as a Bag. > > > > In the end it would like like {{92122,CA},{Unix,FireFox}}. In this > > > example > > > > first bag is GEO dimension and the second is Browser dimension. > > > > 2) Load these into OLAP staging database > > > > 3) Populate star schema from staging tables > > > > > > > > I am sure other people might already be doing this so I thought I'll > > > check > > > > as to if this makes sense. > > > > > > > > >