When updating, use table that uses rows of words and increment the count? -- Colin +1 320 221 9531
> On Jan 20, 2014, at 6:58 AM, David Tinker <david.tin...@gmail.com> wrote: > > I haven't actually tried to use that schema yet, it was just my first idea. > If we use that solution our app would have to read the whole table once a day > or so to find the top 5000'ish words. > > >> On Fri, Jan 17, 2014 at 2:49 PM, Jonathan Lacefield >> <jlacefi...@datastax.com> wrote: >> Hi David, >> >> How do you know that you are receiving a seek for each row? Are you >> querying for a specific word at a time or do the queries span multiple >> words, i.e. what's the query pattern? Also, what is your goal for read >> latency? Most customers can achieve microsecond partition key base query >> reads with Cassanda. This can be done through tuning, data modeling, and/or >> scaling. Please post a cfhistograms for this table as well as provide some >> details on the specific queries you are running. >> >> Thanks, >> >> Jonathan >> >> Jonathan Lacefield >> Solutions Architect, DataStax >> (404) 822 3487 >> >> >> >> >> >> >>> On Fri, Jan 17, 2014 at 1:41 AM, David Tinker <david.tin...@gmail.com> >>> wrote: >>> I have an app that stores lots of bits of text in Cassandra. One of >>> the things I need to do is keep a global word frequency table. >>> Something like this: >>> >>> CREATE TABLE IF NOT EXISTS word_count ( >>> word text, >>> count value, >>> PRIMARY KEY (word) >>> ); >>> >>> This is slow to read as the rows (100's of thousands of them) each >>> need a seek. Is there a better way to model this in Cassandra? I could >>> periodically snapshot the rows into a fat row in another table I >>> suppose. >>> >>> Or should I use Redis or something instead? I would prefer to keep it >>> all Cassandra if possible. > > > > -- > http://qdb.io/ Persistent Message Queues With Replay and #RabbitMQ Integration