Re: trying to count all tuples

2011-06-08 Thread Dmitriy Ryaboy
Thanks for following through William! D On Wed, Jun 8, 2011 at 1:56 PM, William Oberman wrote: > Just in case this ends up as someone else's answer someday, here is the > working query on real data: > rows = LOAD 'cassandra://civicscience/observations' USING > CassandraStorage(); > filter_rows =

Re: trying to count all tuples

2011-06-08 Thread William Oberman
Just in case this ends up as someone else's answer someday, here is the working query on real data: rows = LOAD 'cassandra://civicscience/observations' USING CassandraStorage(); filter_rows = FILTER rows BY $1 is not null; counts = FOREACH filter_rows GENERATE COUNT($1); counts_in_bag = GROUP count

Re: trying to count all tuples

2011-06-07 Thread William Oberman
I think FILTER will do the trick? E.g. rows = LOAD 'cassandra://MyKeyspace/MyColumnFamily' USING CassandraStorage() AS (key, columns: bag {T: tuple(name, value)}); filter_rows = FILTER rows BY columns is not null; counts = FOREACH filter_rows GENERATE COUNT(columns); counts_in_bag = GROUP counts

Re: trying to count all tuples

2011-06-07 Thread William Oberman
I tried this same script on closer to production data, and I'm getting errors. I'm 50% sure it's this: https://issues.apache.org/jira/browse/PIG-1283 One of my rows in cassandra has no columns (maybe?), which maybe causes a null bag, which causes COUNT to blow up (at least, that's my theory). As

Re: trying to count all tuples

2011-06-03 Thread William Oberman
That is exactly what I wanted, thanks for the confirm! On Fri, Jun 3, 2011 at 4:06 PM, Dmitriy Ryaboy wrote: > I am not sure what you mean by "count all columns". The code you have > counts all *cells*. > So: > id1: col1, col2 > id2: col1, col2, col3 > > has 3 columns in a conventional sense, bu

Re: trying to count all tuples

2011-06-03 Thread Dmitriy Ryaboy
I am not sure what you mean by "count all columns". The code you have counts all *cells*. So: id1: col1, col2 id2: col1, col2, col3 has 3 columns in a conventional sense, but your code will return 5. Is that what you want? If so, your code seems correct. D On Fri, Jun 3, 2011 at 12:53 PM, Willia

trying to count all tuples

2011-06-03 Thread William Oberman
Howdy, I'm coming from cassandra, and I'm actually trying to count all columns in a column family. I believe that is similar to counting the number tuples in a bag in the lingo in the pig manual. It was harder than I expected, but I think this works: rows = LOAD 'cassandra://MyKeyspace/MyColumnF