Re: Working with an unknown number of values

2011-05-10 Thread Alan Gates
TOKENIZE takes a string and returns a bag. It's issue is right now it only allows you to split on whitespace. It would make sense to generalize this to take a delimiter. Alan. On May 7, 2011, at 7:55 PM, Jacob Perkins wrote: Dmitriy, I see your point. It would definitely be nice to ha

Re: Working with an unknown number of values

2011-05-07 Thread Jacob Perkins
Dmitriy, I see your point. It would definitely be nice to have a builtin for returning a bag though. I'd actually be happy if TOBAG(FLATTEN(STRSPLIT(X,','))) worked. --jacob @thedatachef On Sat, 2011-05-07 at 18:41 -0700, Dmitriy Ryaboy wrote: > FWIW -- the reason STRSPLIT returns a Tuple is

Re: Working with an unknown number of values

2011-05-07 Thread Dmitriy Ryaboy
FWIW -- the reason STRSPLIT returns a Tuple is that the more common case is thought to be splitting a string of a known format and trying to get some part of it. so, "foreach address_book generate STRSPLIT(phone_number, '-') as (area_code, top_3, bottom_4);" RegexExtractAll (whatever it's called

Re: Working with an unknown number of values

2011-05-06 Thread jacob
On Fri, 2011-05-06 at 16:06 -0600, Christian wrote: > Thank you for taking the time to explain this to me Jacob! > > Am I stuck with hard-coding for my other question? > > Instead of: > 2011-05-01DIRECTIVE132423DIRECTIVE23433DIRECTIVE3 > 1983 > -- > 2011-05-0132423343

Re: Working with an unknown number of values

2011-05-06 Thread Christian
Thank you for taking the time to explain this to me Jacob! Am I stuck with hard-coding for my other question? Instead of: 2011-05-01DIRECTIVE132423DIRECTIVE23433DIRECTIVE3 1983 -- 2011-05-013242334331983 would also do as long as I could count on the column order.

Re: Working with an unknown number of values

2011-05-06 Thread jacob
On Fri, 2011-05-06 at 15:38 -0600, Christian wrote: > > > > > #1) Let's say you are tracking messages and extracting the hash tags from > > > the message and storing them as one field (#hash1#hash2#hash3). This > > means > > > you might have a line that looks something like the following: > > >

Re: Working with an unknown number of values

2011-05-06 Thread Christian
> > > #1) Let's say you are tracking messages and extracting the hash tags from > > the message and storing them as one field (#hash1#hash2#hash3). This > means > > you might have a line that looks something like the following: > > 23432011-05-06T03:04:00.000Zusername > > some+message

Re: Working with an unknown number of values

2011-05-06 Thread jacob
Christian, I've answered inline: On Fri, 2011-05-06 at 15:14 -0600, Christian wrote: > I am sorry if this has been asked in the past. I can't seem to find > information on it. > > I have two questions, but they are somewhat related. > > #1) Let's say you are tracking messages and extracting the

Re: Working with an unknown number of values

2011-05-06 Thread Xiaomeng Wan
you can group on group, like this: A = LOAD '/some/dir' Using PigStorage (date, directive); B = GROUP A by (date, directive); C = FOREACH B GENERATE FLATTEN(group) as (date, directive), COUNT(A) as cnt; D = group c by date; E = foreach D generate group as date, c.(directive,cnt) as cnts; Shaw