Hi, I have data already processed in following form:
( id ,{ bag of words})
So for example:
(foobar, {(foo), (foo),(foobar),(bar)})
(foo,{(bar),(bar)})
and so on..
describe processed gives me:
processed: {id: chararray,tokens: {tuple_of_tokens: (token: chararray)}}
Now what I want is.. also count the number of times a word appears in this
data and output it as
foobar, foo, 2
foobar,foobar,1
foobar,bar,1
foo,bar,2
and so on...
How do I do this in pig?
