Thanks for following through William!
D
On Wed, Jun 8, 2011 at 1:56 PM, William Oberman
wrote:
> Just in case this ends up as someone else's answer someday, here is the
> working query on real data:
> rows = LOAD 'cassandra://civicscience/observations' USING
> CassandraStorage();
> filter_rows =
Just in case this ends up as someone else's answer someday, here is the
working query on real data:
rows = LOAD 'cassandra://civicscience/observations' USING
CassandraStorage();
filter_rows = FILTER rows BY $1 is not null;
counts = FOREACH filter_rows GENERATE COUNT($1);
counts_in_bag = GROUP count
I think FILTER will do the trick? E.g.
rows = LOAD 'cassandra://MyKeyspace/MyColumnFamily' USING CassandraStorage()
AS (key, columns: bag {T: tuple(name, value)});
filter_rows = FILTER rows BY columns is not null;
counts = FOREACH filter_rows GENERATE COUNT(columns);
counts_in_bag = GROUP counts
I tried this same script on closer to production data, and I'm getting
errors. I'm 50% sure it's this:
https://issues.apache.org/jira/browse/PIG-1283
One of my rows in cassandra has no columns (maybe?), which maybe causes a
null bag, which causes COUNT to blow up (at least, that's my theory). As
That is exactly what I wanted, thanks for the confirm!
On Fri, Jun 3, 2011 at 4:06 PM, Dmitriy Ryaboy wrote:
> I am not sure what you mean by "count all columns". The code you have
> counts all *cells*.
> So:
> id1: col1, col2
> id2: col1, col2, col3
>
> has 3 columns in a conventional sense, bu
I am not sure what you mean by "count all columns". The code you have
counts all *cells*.
So:
id1: col1, col2
id2: col1, col2, col3
has 3 columns in a conventional sense, but your code will return 5. Is
that what you want? If so, your code seems correct.
D
On Fri, Jun 3, 2011 at 12:53 PM, Willia
Howdy,
I'm coming from cassandra, and I'm actually trying to count all columns in a
column family. I believe that is similar to counting the number tuples in a
bag in the lingo in the pig manual. It was harder than I expected, but I
think this works:
rows = LOAD 'cassandra://MyKeyspace/MyColumnF