Hello. Does not FLATTEN do exactly this?
Best regards, Vitalii Tymchyshyn 2012/8/30 Steve Bernstein <steve.bernst...@deem.com> > Some clarification on the below. Ignore the outer bag, I'd removed some > data elements for clarity and simplicity. Basically, I'm trying to find a > way to go from: > > {(pg),(pg),...,(pg)} > to > {(pg,pg,...,pg)} > > For an abritrary number of "pg" tuples. > > SB > > -----Original Message----- > From: Steve Bernstein [mailto:steve.bernst...@deem.com] > Sent: Wednesday, August 29, 2012 4:28 PM > To: user@pig.apache.org > Subject: group by clickstream > > Hi all, > I have a bag, clickstreams: {clickStream: {pageName: chararray}}, for > which each row represents a sequence of pages and events in a single > session on a website. The interior bag, clickstream, represents this as a > sequence of one or more single element tuples, e.g., > > {(homepage),(pg1),(pg2),...,(pgN)} > > I'd like to group by the sequences so I can get counts and ultimately sort > to find the most common clickstreams. A bag can't be a key for grouping, > I've discovered, but it seems like it ought to be easy to flatten the > clickstream bag into some other form such that the sequences can be used as > keys for grouping. But I can't figure it out. > > Any ideas? > > Thanks! > Steve > > -- Best regards, Vitalii Tymchyshyn