Jeremy Hanna <jeremy.hanna1234 <at> gmail.com> writes: > > If you are only interested in loading one row, why do you need to use Pig? > Is it an extremely wide row? > > Unless you are using an ordered partitioner, you can't limit the rows you mapreduce over currently - you > have to mapreduce over the whole column family. That will change probably in 1.1. However, again, if > you're only after 1 row, why don't you just use a regular cassandra client > and get that row and operate on it > that way? > > I suppose you *could* use pig and filter by the ID or something. If you *do* have an ordered partitioner in > your cluster, it's just a matter of specifying the key range. > > On Nov 17, 2011, at 11:16 AM, Aaron Griffith wrote: > > > I am trying to do the following with a PIG script and am having trouble finding > > the correct syntax. > > > > - I want to use the LOAD function to load a single key/value "row" into a pig > > object. > > - The contents of that row is then flattened into a list of keys. > > - I then want to use that list of keys for another load function to select the > > key/value pairs from another column family. > > > > The only way I can get this to work is by using a generic load function > > then > > applying filters to get at the data I want. Then joining the two pig > > objects > > together to filter the second column family. > > > > I want to avoid having to pull the entire column familys into pig, it is > > way too > > much data. > > > > Any suggestions? > > > > Thanks! > > > >
It is a very wide row, with nested keys to another column family. Pig makes it easy convert it into a list of keys. It also makes it easy to write out the results into Hadoop. I then want to take that list of keys to go get rows from whatever column family they are for. Thanks for you response.