On Sep 17, 2012, at 3:04 AM, aaron morton <aa...@thelastpickle.com> wrote:
>> I have a schema that represents a filesystem and one example of a Super CF >> is: > This may help with some ideas > http://www.datastax.com/dev/blog/cassandra-file-system-design > > In general we advise to avoid Super Columns if possible. They are often > slower, and the sub columns are not indexed. Meaning all the sub columns have > to be read into memory. > > >> So if I set column_count = 10000, as I have now, but fetch 1000 dirs (rows) >> and each one happens to have 10000 files (columns) the dataset is 1000x10000. > This is the way the query works internally. Multiget is simply a collections > of independent gets. > > >> The multiget() is more efficient, but I'm having trouble trying to limit the >> size of the data returned in order to not crash the cassandra node. > Often less is more. I would only ask for a few 10's of rows at a time, or try > to limit the size of the returned query to a few MB's. Otherwise a lot of > data get's dragged through cassandra, the network and finally Python. > > You may want to consider a CF like the inode CF it the article above. Where > the parent dir is a column with a secondary index. Thanks Aaron! I will take your points into consideration. Best regards, André