Re: prep for cassandra storage from pig

2011-06-15 Thread Jeremy Hanna
Yeah - for completely dynamic column names, then yeah - From/To Cassandra Bag doesn't handle that. It does handle prefixed names though - like link* will get a bag of all the columns that start with link. But sounds like you are doing what I would have to do if I got into a nested data conundr

Re: prep for cassandra storage from pig

2011-06-15 Thread William Oberman
I'll do a reply all, to keep this more consistent (sorry!). Rather than staying stuck, I wrote a custom function: TupleToBagOfTuple. I'm curious if I could have avoided it with proper pig scripting though. On Wed, Jun 15, 2011 at 3:08 PM, William Oberman wrote: > My problem is the column names a

Re: prep for cassandra storage from pig

2011-06-15 Thread William Oberman
My problem is the column names are dynamic (a date), and pygmalion seems to want the column names to be fixed at "compile time" (the script). On Wed, Jun 15, 2011 at 3:04 PM, Jeremy Hanna wrote: > Hi Will, > > That's partly why I like to use FromCassandraBag and ToCassandraBag from > pygmalion -

Re: prep for cassandra storage from pig

2011-06-15 Thread William Oberman
Rather than staying stuck, I wrote a custom function: TupleToBagOfTuple. I'm curios if I could have avoided this though. On Wed, Jun 15, 2011 at 2:17 PM, William Oberman wrote: > I think I'm stuck on typing issues trying to store data in cassandra. To > verify, cassandra wants (key, {tuples}) >

Re: prep for cassandra storage from pig

2011-06-15 Thread Jeremy Hanna
Hi Will, That's partly why I like to use FromCassandraBag and ToCassandraBag from pygmalion - it does the work for you to get it back into a form that cassandra understands. Others may know better how to massage the data into that form using just pig, but if all else fails, you could write a u

prep for cassandra storage from pig

2011-06-15 Thread William Oberman
I think I'm stuck on typing issues trying to store data in cassandra. To verify, cassandra wants (key, {tuples}) My pig script is fairly brief: raw = LOAD 'cassandra://test_in/test_cf' USING CassandraStorage() AS (key:chararray, columns:bag {column:tuple (name, value)}); --colums == timeUUID -> J