Hello.

Does not FLATTEN do exactly this?

Best regards, Vitalii Tymchyshyn

2012/8/30 Steve Bernstein <steve.bernst...@deem.com>

> Some clarification on the below.  Ignore the outer bag, I'd removed some
> data elements for clarity and simplicity.  Basically, I'm trying to find a
> way to go from:
>
> {(pg),(pg),...,(pg)}
> to
> {(pg,pg,...,pg)}
>
> For an abritrary number of "pg" tuples.
>
> SB
>
> -----Original Message-----
> From: Steve Bernstein [mailto:steve.bernst...@deem.com]
> Sent: Wednesday, August 29, 2012 4:28 PM
> To: user@pig.apache.org
> Subject: group by clickstream
>
> Hi all,
> I have a bag, clickstreams: {clickStream: {pageName: chararray}}, for
> which each row represents a sequence of pages and events in a single
> session on a website.  The interior bag, clickstream, represents this as a
> sequence of one or more single element tuples, e.g.,
>
> {(homepage),(pg1),(pg2),...,(pgN)}
>
> I'd like to group by the sequences so I can get counts and ultimately sort
> to find the most common clickstreams.  A bag can't be a key for grouping,
> I've discovered, but it seems like it ought to be easy to flatten the
> clickstream bag into some other form such that the sequences can be used as
> keys for grouping.  But I can't figure it out.
>
> Any ideas?
>
> Thanks!
> Steve
>
>


-- 
Best regards,
 Vitalii Tymchyshyn

Reply via email to