Bill de hÓra already answered, I'd like to add: To achieve ~4ms reads (from client standpoint): 1. You can't use multi-slice, since different keys may occur on different nodes that require internode communication. Design you data and reads to use one key/row. 2. Use ConsistencyLevel.ONE to avoid waiting for other nodes. 3. Use smart client that selects endpoints by token (key) to put request to appropriate node, Astyanax (Java) or write such client yourself. 4. Turn off dynamic snitch. While coordinator node may read locally, dynamic snitch may redirect it to another replica. 5. Use SSD's to avoid re-cache issue when sstables are compacted. 6. If you do writes, the rest issue is GC. If you're not on Azul Zing JVM, which I can't confirm to be better than Oracle HotSpot or JRockit (both has GC issues), you can't tune JVM to avoid Young Gen GC pauses to be as low as you need. You will fight pause frequency VS time. So if you can afford Zing, check also Aerospike (ex-CitrusLeaf) alternative to Cassandra, which is written in C and has no GC issues.
> From: Bill de hÓra [mailto:b...@dehora.net] > Sent: Thursday, February 21, 2013 22:07 > To: user@cassandra.apache.org > Subject: Re: Using Cassandra for read operations > > In a nutshell - > > - Start with defaults and tune based on small discrete adjustments and leave > time to see the effect of each change. No-one will know your workload > better than you and the questions you are asking are workload sensitive. > > - Allow time for tuning and spending time understanding the memory model > and JVM GC. > > - Be very careful with caches. Leave enough room in the OS for its own disk > cache. > > - Get an SSD > > > Bill > > > On 21 Feb 2013, at 19:03, amulya rattan <talk2amu...@gmail.com> wrote: > > > Dear All, > > > > We are currently evaluating Cassandra for an application involving strict > SLAs(Service level agreements). We just need one column family with a long > key and approximately 70-80 bytes row. We are not concerned about write > performance but are primarily concerned about read. For our SLAs, a read of > max 15-20 rows at once(using multi slice), should not take more than 4 ms. > Till now, on a single node setup, using cassandra' stress tool, the numbers > are > promising. But I am guessing that's because there is no network latency > involved there and since we set memtable around 2gb(4 gb heap), we never > had to get to Disk I/O. > > > > Assuming our nodes having >32GB RAM, a couple of questions regarding > read: > > > > * To avoid disk I/Os, the best option we thought is to have data in memory. > Is it a good idea to have memtable setup around 1/2 or 3/4 of heap size? > Obviously flushing will take a lot of time but would that hurt that node's > performance big time? > > > > * Cassandra stress tool only gives out average read latency. Is there a way > to figure out max read-latency for a bunch of read operations? > > > > * How big a row cache can one have? Given that cassandra provides off- > heap row caching, in a machine >32 gb RAM, would it be wise to have a >10 > gb row cache with 8 gb java heap? And how big should the corresponding key > cache be then? > > > > Any response is appreciated. > > > > ~Amulya > > Best regards / Pagarbiai Viktor Jevdokimov Senior Developer Email: viktor.jevdoki...@adform.com Phone: +370 5 212 3063 Fax: +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies.