I bet it would be more efficient to just make a udf that goes from tuple to bag. This is not an uncommon request, though, and probably something we should build into pig.
2012/2/23 Norbert Burger <[email protected]> > Try adding a FLATTEN before applying TOBAG: > > foo = foreach searches GENERATE FLATTEN(STRSPLIT(hostinglist, ',')) as > hostings, user; > bar = foreach foo GENERATE TOBAG(*); > dump bar; > > Norbert > > On Thu, Feb 23, 2012 at 11:32 AM, Flo Leibert <[email protected] > >wrote: > > > I was expecting similar behavior as TOKENIZE from STRSPLIT. I.e. all > items > > ending up in a bag. > > Is there a way to further split these out such that they're elements of a > > bag? The TOBAG function just places the entire tuple in a bag... > > > > Thanks! > > > > On Wed, Feb 22, 2012 at 7:59 PM, Norbert Burger < > [email protected] > > >wrote: > > > > > Hi Flo - in your example data, it seems like the STRSPLIT() is working > as > > > expected -- the function returns back a tuple which is being serialized > > in > > > the shell as "(t1,t2,t3,t4)". > > > > > > When you mention "hostinglist isn't split properly", which part are you > > > referring to? > > > > > > Norbert > > > > > > On Wed, Feb 22, 2012 at 9:13 PM, Flo Leibert < > [email protected] > > > >wrote: > > > > > > > Running pig 0.9.1 in local mode, STRSPLIT doesn't seem to split on > > ','. I > > > > have the following data > > > > > > > > user2 hosting9 > > > > user1 hosting1,hosting2,hosting3,hosting4 > > > > user1 hosting2,hosting4,hosting5 > > > > > > > > > > > > searches = load '/data/sample/searches' using PigStorage('\t') as > > (user: > > > > chararray, hostinglist: chararray); > > > > grunt> describe searches > > > > searches: {user: chararray,hostinglist: chararray} > > > > foo = foreach searches GENERATE STRSPLIT(hostinglist, ',') as > hostings, > > > > user; > > > > dump foo > > > > ((hosting9),user2) > > > > ((hosting1,hosting2,hosting3,hosting4),user1) > > > > ((hosting2,hosting4,hosting5),user1) > > > > > > > > > > > > hostinglist isn't split properly - i tried to use the unicode > character > > > as > > > > well but still no luck. Is this a known bug? > > > > > > > > Thanks, > > > > Flo > > > > > > > > > >
