When updating, use table that uses rows of words and increment the count?

--
Colin 
+1 320 221 9531

 

> On Jan 20, 2014, at 6:58 AM, David Tinker <david.tin...@gmail.com> wrote:
> 
> I haven't actually tried to use that schema yet, it was just my first idea. 
> If we use that solution our app would have to read the whole table once a day 
> or so to find the top 5000'ish words.
> 
> 
>> On Fri, Jan 17, 2014 at 2:49 PM, Jonathan Lacefield 
>> <jlacefi...@datastax.com> wrote:
>> Hi David,
>> 
>>   How do you know that you are receiving a seek for each row?  Are you 
>> querying for a specific word at a time or do the queries span multiple 
>> words, i.e. what's the query pattern? Also, what is your goal for read 
>> latency?  Most customers can achieve microsecond partition key base query 
>> reads with Cassanda.  This can be done through tuning, data modeling, and/or 
>> scaling.  Please post a cfhistograms for this table as well as provide some 
>> details on the specific queries you are running.
>> 
>> Thanks,
>> 
>> Jonathan
>> 
>> Jonathan Lacefield
>> Solutions Architect, DataStax
>> (404) 822 3487
>> 
>> 
>> 
>> 
>> 
>> 
>>> On Fri, Jan 17, 2014 at 1:41 AM, David Tinker <david.tin...@gmail.com> 
>>> wrote:
>>> I have an app that stores lots of bits of text in Cassandra. One of
>>> the things I need to do is keep a global word frequency table.
>>> Something like this:
>>> 
>>> CREATE TABLE IF NOT EXISTS word_count (
>>>   word text,
>>>   count value,
>>>   PRIMARY KEY (word)
>>> );
>>> 
>>> This is slow to read as the rows (100's of thousands of them) each
>>> need a seek. Is there a better way to model this in Cassandra? I could
>>> periodically snapshot the rows into a fat row in another table I
>>> suppose.
>>> 
>>> Or should I use Redis or something instead? I would prefer to keep it
>>> all Cassandra if possible.
> 
> 
> 
> -- 
> http://qdb.io/ Persistent Message Queues With Replay and #RabbitMQ Integration

Reply via email to