On Fri, Aug 27, 2010 at 1:26 PM, Ran Tavory <ran...@gmail.com> wrote: > I haven't benchmarked so it's purely theoretical. > If there's no caching then I'm pretty sure just writing would yield better > performance. > If you do cache rows/keys it really depends on your hit ratio. Naturally if > you have a small data set and high cache ratio and use row caching I'm > pretty sure it's better to read first. > Although writes are order of magnitude faster than reads, if you have high > write rate then cassandra might throttle you at different bottlenecks, > depending on your hardware and data so for example disk is many times a > bottleneck (and you can teak storage-conf to improve that), sometimes memory > is pressing and I have seen also CPU pressure although it's less common. > You need to also keep in mind that even if you write the same value but with > a newer timestamp then cassandra will have to run compactions and that's > where disk/mem is usually bottlenecking. > Bottom line - if you can cache (have enough mem) and there's good hit ratio, > cache entire rows and read first. If not, always write first and make sure > compactions aren't killing you, if they are, tweak storage-conf to do less > compactions. > > On Fri, Aug 27, 2010 at 5:44 PM, Chen Xinli <chen.d...@gmail.com> wrote: >> >> I think Just writing all the time is much better, as most of replacements >> will be done in memtable. >> >> also you should set a large memtable size, in compared with the average >> row size. >> >> >> 2010/8/27 Daniel Doubleday <daniel.double...@gmx.net> >>> >>> Hi people >>> >>> I was wondering if anyone already benchmarked such a situation: >>> >>> I have: >>> >>> day of year (row key) -> SomeId (column key) -> byte[0] >>> >>> I need to make sure that I write SomeId, but in around 80% of the cases >>> it will be already present (so I would essentially replace it with itself). >>> RF will be 2. >>> >>> So should I rather just write all the time (given that cassandra is so >>> fast on write) or should I read and write only if not present? >>> >>> Cheers, >>> Daniel >> >> >> -- >> Best Regards, >> Chen Xinli > >
Read before write is usually a bad idea in cassandra. We have a multiple node cluster with ~ 100 GB per node. We have a fairly substantial 800,000 item row cache, which sees about a 70% hit rate. Our application measures writes at QUORUM 1 ms, and reads at ONE 7-10, reads seem to be about 3-6 ms when the data was around 70GB per node. Given that a write takes 1 ms and a read takes 7 ms, and that reads are more intensive I would almost never advocate reading before writing. Edward