Nope, tried that, it breaks it back into one tuple per record...not what I want.
-----Original Message----- From: Віталій Тимчишин [mailto:tiv...@gmail.com] Sent: Friday, August 31, 2012 1:49 PM To: user@pig.apache.org Subject: Re: group by clickstream Hello. Does not FLATTEN do exactly this? Best regards, Vitalii Tymchyshyn 2012/8/30 Steve Bernstein <steve.bernst...@deem.com> > Some clarification on the below. Ignore the outer bag, I'd removed > some data elements for clarity and simplicity. Basically, I'm trying > to find a way to go from: > > {(pg),(pg),...,(pg)} > to > {(pg,pg,...,pg)} > > For an abritrary number of "pg" tuples. > > SB > > -----Original Message----- > From: Steve Bernstein [mailto:steve.bernst...@deem.com] > Sent: Wednesday, August 29, 2012 4:28 PM > To: user@pig.apache.org > Subject: group by clickstream > > Hi all, > I have a bag, clickstreams: {clickStream: {pageName: chararray}}, for > which each row represents a sequence of pages and events in a single > session on a website. The interior bag, clickstream, represents this > as a sequence of one or more single element tuples, e.g., > > {(homepage),(pg1),(pg2),...,(pgN)} > > I'd like to group by the sequences so I can get counts and ultimately > sort to find the most common clickstreams. A bag can't be a key for > grouping, I've discovered, but it seems like it ought to be easy to > flatten the clickstream bag into some other form such that the > sequences can be used as keys for grouping. But I can't figure it out. > > Any ideas? > > Thanks! > Steve > > -- Best regards, Vitalii Tymchyshyn