Jamal, You're going to want to use a FLATTEN and another group by. Consider:
flattened = foreach processed generate id, flatten(tokens) as token; frequency = foreach (group flattened by (id, token)) generate flatten(group) as (id, token), COUNT(flattened) as freq; Of course, this will spawn another map-reduce job. However, since COUNT is algebraic, pig will make use of combiners drastically reducing the amount of data sent to the reducers. --jacob @thedatachef On Nov 19, 2013, at 5:45 PM, jamal sasha <jamalsha...@gmail.com> wrote: > Hi, > > I have data already processed in following form: > > > ( id ,{ bag of words}) > So for example: > > (foobar, {(foo), (foo),(foobar),(bar)}) > (foo,{(bar),(bar)}) > > and so on.. > describe processed gives me: > processed: {id: chararray,tokens: {tuple_of_tokens: (token: chararray)}} > > > Now what I want is.. also count the number of times a word appears in this > data and output it as > foobar, foo, 2 > foobar,foobar,1 > foobar,bar,1 > foo,bar,2 > > and so on... > > How do I do this in pig?