Re: Help with Pig Script

Aaron Griffith Thu, 17 Nov 2011 11:45:04 -0800

Jeremy Hanna <jeremy.hanna1234 <at> gmail.com> writes:

> 
> If you are only interested in loading one row, why do you need to use Pig?  
> Is 
it an extremely wide row?
> 
> Unless you are using an ordered partitioner, you can't limit the rows you 
mapreduce over currently - you
> have to mapreduce over the whole column family.  That will change probably in 
1.1.  However, again, if
> you're only after 1 row, why don't you just use a regular cassandra client 
> and 
get that row and operate on it
> that way?
> 
> I suppose you *could* use pig and filter by the ID or something.  If you *do* 
have an ordered partitioner in
> your cluster, it's just a matter of specifying the key range.
> 
> On Nov 17, 2011, at 11:16 AM, Aaron Griffith wrote:
> 
> > I am trying to do the following with a PIG script and am having trouble 
finding 
> > the correct syntax.
> > 
> > - I want to use the LOAD function to load a single key/value "row" into a 
pig 
> > object.
> > - The contents of that row is then flattened into a list of keys.
> > - I then want to use that list of keys for another load function to select 
the 
> > key/value pairs from another column family.
> > 
> > The only way I can get this to work is by using a generic load function 
> > then 
> > applying filters to get at the data I want. Then joining the two pig 
> > objects 
> > together to filter the second column family.
> > 
> > I want to avoid having to pull the entire column familys into pig, it is 
> > way 
too 
> > much data.
> > 
> > Any suggestions?
> > 
> > Thanks!
> > 
> 
>



It is a very wide row, with nested keys to another column family.  Pig makes it 
easy convert it into a list of keys.

It also makes it easy to write out the results into Hadoop.

I then want to take that list of keys to go get rows from whatever column 
family 
they are for.

Thanks for you response.

Re: Help with Pig Script

Reply via email to