I mean iterate over each column -- more precisly: *bunches of columns* using slices -- and write new columns in the inversed index. Tamar's data model is made for real time analysis. It's maybe overdesigned for a daily ranking. I agree with Samal, you should split your data across the space of tokens. Only CF Ranking feeding would be affected, not the "top N" queries.
Filippo Diotalevi <fili...@ntoklo.com> a écrit sur 21/05/2012 19:05:28 : > Hi Romain, > thanks for your suggestion. > > When you say " build every day a ranking in a dedicated CF by > iterating over events:" do you mean > - load all the columns for the specified row key > - iterate over each column, and write a new column in the inversed index > ? > > That's my current approach, but since I have many of these wide rows > (1 per day), the process is extremely slow as it involves moving an > entire row from Cassandra to client, inverting every column, and > sending the data back to create the inversed index.