I haven't benchmarked so it's purely theoretical.
If there's no caching then I'm pretty sure just writing would yield better
performance.
If you do cache rows/keys it really depends on your hit ratio. Naturally if
you have a small data set and high cache ratio and use row caching I'm
pretty sure it's better to read first.
Although writes are order of magnitude faster than reads, if you have high
write rate then cassandra might throttle you at different bottlenecks,
depending on your hardware and data so for example disk is many times a
bottleneck (and you can teak storage-conf to improve that), sometimes memory
is pressing and I have seen also CPU pressure although it's less common.
You need to also keep in mind that even if you write the same value but with
a newer timestamp then cassandra will have to run compactions and that's
where disk/mem is usually bottlenecking.

Bottom line - if you can cache (have enough mem) and there's good hit ratio,
cache entire rows and read first. If not, always write first and make sure
compactions aren't killing you, if they are, tweak storage-conf to do less
compactions.


On Fri, Aug 27, 2010 at 5:44 PM, Chen Xinli <chen.d...@gmail.com> wrote:

> I think Just writing all the time is much better, as most of replacements
> will be done in memtable.
>
> also you should set a large memtable size, in compared with the average row
> size.
>
>
> 2010/8/27 Daniel Doubleday <daniel.double...@gmx.net>
>
> Hi people
>>
>> I was wondering if anyone already benchmarked such a situation:
>>
>> I have:
>>
>> day of year (row key) -> SomeId (column key) -> byte[0]
>>
>> I need to make sure that I write SomeId, but in around 80% of the cases it
>> will be already present (so I would essentially replace it with itself). RF
>> will be 2.
>>
>> So should I rather just write all the time (given that cassandra is so
>> fast on write) or should I read and write only if not present?
>>
>> Cheers,
>> Daniel
>
>
>
>
> --
> Best Regards,
> Chen Xinli
>

Reply via email to