I bet it would be more efficient to just make a udf that goes from tuple to
bag. This is not an uncommon request, though, and probably something we
should build into pig.

2012/2/23 Norbert Burger <[email protected]>

> Try adding a FLATTEN before applying TOBAG:
>
> foo = foreach searches GENERATE FLATTEN(STRSPLIT(hostinglist, ',')) as
> hostings, user;
> bar = foreach foo GENERATE TOBAG(*);
> dump bar;
>
> Norbert
>
> On Thu, Feb 23, 2012 at 11:32 AM, Flo Leibert <[email protected]
> >wrote:
>
> > I was expecting similar behavior as TOKENIZE from STRSPLIT. I.e. all
> items
> > ending up in a bag.
> > Is there a way to further split these out such that they're elements of a
> > bag? The TOBAG function just places the entire tuple in a bag...
> >
> > Thanks!
> >
> > On Wed, Feb 22, 2012 at 7:59 PM, Norbert Burger <
> [email protected]
> > >wrote:
> >
> > > Hi Flo - in your example data, it seems like the STRSPLIT() is working
> as
> > > expected -- the function returns back a tuple which is being serialized
> > in
> > > the shell as "(t1,t2,t3,t4)".
> > >
> > > When you mention "hostinglist isn't split properly", which part are you
> > > referring to?
> > >
> > > Norbert
> > >
> > > On Wed, Feb 22, 2012 at 9:13 PM, Flo Leibert <
> [email protected]
> > > >wrote:
> > >
> > > > Running pig 0.9.1 in local mode, STRSPLIT doesn't seem to split on
> > ','. I
> > > > have the following data
> > > >
> > > > user2 hosting9
> > > > user1 hosting1,hosting2,hosting3,hosting4
> > > > user1 hosting2,hosting4,hosting5
> > > >
> > > >
> > > > searches = load '/data/sample/searches' using PigStorage('\t') as
> > (user:
> > > > chararray, hostinglist: chararray);
> > > > grunt> describe searches
> > > > searches: {user: chararray,hostinglist: chararray}
> > > > foo = foreach searches GENERATE STRSPLIT(hostinglist, ',') as
> hostings,
> > > > user;
> > > > dump foo
> > > > ((hosting9),user2)
> > > > ((hosting1,hosting2,hosting3,hosting4),user1)
> > > > ((hosting2,hosting4,hosting5),user1)
> > > >
> > > >
> > > > hostinglist isn't split properly - i tried to use the unicode
> character
> > > as
> > > > well but still no luck. Is this a known bug?
> > > >
> > > > Thanks,
> > > > Flo
> > > >
> > >
> >
>

Reply via email to