Re: Query advice to prevent node overload

André Cruz Mon, 17 Sep 2012 02:07:01 -0700

On Sep 17, 2012, at 3:04 AM, aaron morton <aa...@thelastpickle.com> wrote:


>> I have a schema that represents a filesystem and one example of a Super CF 
>> is:
> This may help with some ideas
> http://www.datastax.com/dev/blog/cassandra-file-system-design
> 
> In general we advise to avoid Super Columns if possible. They are often 
> slower, and the sub columns are not indexed. Meaning all the sub columns have 
> to be read into memory. 
> 
> 
>> So if I set column_count = 10000, as I have now, but fetch 1000 dirs (rows) 
>> and each one happens to have 10000 files (columns) the dataset is 1000x10000.
> This is the way the query works internally. Multiget is simply a collections 
> of independent gets. 
> 
>  
>> The multiget() is more efficient, but I'm having trouble trying to limit the 
>> size of the data returned in order to not crash the cassandra node.
> Often less is more. I would only ask for a few 10's of rows at a time, or try 
> to limit the size of the returned query to a few MB's. Otherwise a lot of 
> data get's dragged through cassandra, the network and finally Python. 
> 
> You may want to consider a CF like the inode CF it the article above. Where 
> the parent dir is a column with a secondary index. 

Thanks Aaron! I will take your points into consideration.

Best regards,
André

Re: Query advice to prevent node overload

Reply via email to