In that case.. why not use TOKENIZE? Usually people want tuples from STRSPLIT because they want to do things like fetch specific parts of the string (split some text on tab, treat results as columns -- not possible with TOKENIZE which returns a bag)
D On Thu, Feb 23, 2012 at 8:32 AM, Flo Leibert <[email protected]> wrote: > I was expecting similar behavior as TOKENIZE from STRSPLIT. I.e. all items > ending up in a bag. > Is there a way to further split these out such that they're elements of a > bag? The TOBAG function just places the entire tuple in a bag... > > Thanks! > > On Wed, Feb 22, 2012 at 7:59 PM, Norbert Burger > <[email protected]>wrote: > >> Hi Flo - in your example data, it seems like the STRSPLIT() is working as >> expected -- the function returns back a tuple which is being serialized in >> the shell as "(t1,t2,t3,t4)". >> >> When you mention "hostinglist isn't split properly", which part are you >> referring to? >> >> Norbert >> >> On Wed, Feb 22, 2012 at 9:13 PM, Flo Leibert <[email protected] >> >wrote: >> >> > Running pig 0.9.1 in local mode, STRSPLIT doesn't seem to split on ','. I >> > have the following data >> > >> > user2 hosting9 >> > user1 hosting1,hosting2,hosting3,hosting4 >> > user1 hosting2,hosting4,hosting5 >> > >> > >> > searches = load '/data/sample/searches' using PigStorage('\t') as (user: >> > chararray, hostinglist: chararray); >> > grunt> describe searches >> > searches: {user: chararray,hostinglist: chararray} >> > foo = foreach searches GENERATE STRSPLIT(hostinglist, ',') as hostings, >> > user; >> > dump foo >> > ((hosting9),user2) >> > ((hosting1,hosting2,hosting3,hosting4),user1) >> > ((hosting2,hosting4,hosting5),user1) >> > >> > >> > hostinglist isn't split properly - i tried to use the unicode character >> as >> > well but still no luck. Is this a known bug? >> > >> > Thanks, >> > Flo >> > >>
