Re: Pig & Cassandra integration

Jeremy Hanna Wed, 28 Sep 2011 10:04:52 -0700

It's been mentioned in this thread, but if you're using tabular (static column 
names) data, you might consider using Pygmalion.  It will extract the values 
from Cassandra to simplify grouping by values and other operations.
https://github.com/jeromatron/pygmalion
What you'll want to look at is the FromCassandraBag udf, which has an example 
here:
https://github.com/jeromatron/pygmalion/blob/master/scripts/from_to_cassandra_bag_example.pig


Hope that helps - we use pygmalion 1.0.0 for all our scripts in production.

On Sep 28, 2011, at 11:18 AM, Tamil Selvan wrote:

> Hi,
> I'm trying to integrate pig with cassandra. 
> My columnfamily in cassandra is
> name -> xxx
> Age -> yyy
> class -> zzz
> This is how I load data
> rows =LOAD 'cassandra://TestKeySpace/TestPig' USING CassandraStorage()
> as (key,columns:bag{column:tuple(name,value)});
> 
> Now I wish to perform group by based on value of class. I tried
> 
> col_values = FOREACH rows GENERATE (columns.value) as list:bag{};
> 
> This gave me the result in following Schema :bag(:tuple(chararray))
> Ex: on dump col_values i got {(xxx),(yyy),(zzz)} 
> 
> Now if I try to access
> 
> list = FOREACH col_values GENERATE (list.$0, list.$1);
> 
> I'm getting undefined index access error. Like
> list.$1 doesn't exist :bag[:tuple(chararray)] has only one column [But
> there are 3]
> 
> How can i access tuple wise data in such cases?
> I couldn't perform group by based on 1 column because of this.
> 
> I tried TOTUPLE but the problem is, it converts the entire bag a tuple
> and applies group by on that.
> 
> Help me out
> 
> Regards,
> Tamil
>

Re: Pig & Cassandra integration

Reply via email to