Re: please help with multiget

Aaron Morton Mon, 17 Jan 2011 17:14:03 -0800

If you can provide some more information on a specific use case we may be able to help with the modelling.

The general approach is to denormalise the data to the point where each request/activity/feature in your application results in a call to get data from one or more rows in one CF. It's not always possible, it's just the goal I use when modelling. I also lean towards making fewer calls that return more data, rather than more calls that return the exact amount of data. IMHO additional filtering and ordering on the client side will reduce server load at scale.

You may be able to use a multiget for a super_column for multiple rows, which will return the super columns and (their potentially different) list of columns. Or if the rows have only a few standard columns pull back all columns for the rows.

Hope that helps.

Aaron

On 18 Jan, 2011,at 01:53 PM, Shu Zhang <szh...@mediosystems.com> wrote:

Here's the method declaration for quick reference:
map<string,list<ColumnOrSuperColumn>> multiget_slice(string keyspace, list<string> keys, ColumnParent column_parent, SlicePredicate predicate, ConsistencyLevel consistency_level)

It looks like you must have the same SlicePredicate for every key in your batch retrieval, so what are you suppose to do when you need to retrieve different columns for different keys? I mean, it seems like to fully take advantage of cassandra's data structure, you often want to put dynamic data as column names, and different rows may have totally different column names. That's pretty standard practice right? Then it seems like you should be able to batch get-requests mapping different slicepredicates to different keys in an efficient way.

The only way I can think of to retrieve different columns for different keys (besides breaking them into individual requests) is to set the SlicePredicate so that you retrieve entire rows and then parse it on the client side... but that seems a little inefficient and a bit of a pain. Is that what people do? I can see this not being TOO much more inefficient since a single row is always kept together physically.

I haven't found a lot of other complaints about this so maybe I am missing something. But a get request takes a key and a column path, so it seems like a batch-get should allow you to specify any combination of key-columnPath or key-slicePredicate pairs. I mean, intuitive design-wise, for any batch operation, it makes sense to allow for batching together any number of corresponding non-batch operations. Ie. If I can make a non-batch get request for (key1, colNam1), and I can make a non-batch get request for (key2, colName2), then I should be able to make a batch request for (key1, colName1) and (key2, colName2).

Furthermore, a batch-get method signature like

map<string,list<ColumnOrSuperColumn>> multiget_slice(string keyspace, map<string, list<SlicePredicate>>> mutation_map, ConsistencyLevel consistency_level)

look a lot more symmetrical to the batch_mutate method
void batch_mutate(string keyspace, map<string,map<string,list<Mutation>>> mutation_map, ConsistencyLevel consistency_level)

Thoughts?

Thanks,
Shu

Re: please help with multiget

Reply via email to