RE: Using Cassandra for read operations

Viktor Jevdokimov Thu, 21 Feb 2013 23:03:08 -0800

Bill de hÓra already answered, I'd like to add:

To achieve ~4ms reads (from client standpoint):
1. You can't use multi-slice, since different keys may occur on different nodes 
that require internode communication. Design you data and reads to use one 
key/row.
2. Use ConsistencyLevel.ONE to avoid waiting for other nodes.
3. Use smart client that selects endpoints by token (key) to put request to 
appropriate node, Astyanax (Java) or write such client yourself.
4. Turn off dynamic snitch. While coordinator node may read locally, dynamic 
snitch may redirect it to another replica.
5. Use SSD's to avoid re-cache issue when sstables are compacted.
6. If you do writes, the rest issue is GC. If you're not on Azul Zing JVM, 
which I can't confirm to be better than Oracle HotSpot or JRockit (both has GC 
issues), you can't tune JVM to avoid Young Gen GC pauses to be as low as you 
need. You will fight pause frequency VS time.
So if you can afford Zing, check also Aerospike (ex-CitrusLeaf) alternative to 
Cassandra, which is written in C and has no GC issues.



> From: Bill de hÓra [mailto:b...@dehora.net]
> Sent: Thursday, February 21, 2013 22:07
> To: user@cassandra.apache.org
> Subject: Re: Using Cassandra for read operations
>
> In a nutshell -
>
> - Start with defaults and tune based on small discrete adjustments and leave
> time to see the effect of each change. No-one will know your workload
> better than you and the questions you are asking are workload sensitive.
>
> - Allow time for tuning and spending time understanding the memory model
> and JVM GC.
>
> - Be very careful with caches. Leave enough room in the OS for its own disk
> cache.
>
> - Get an SSD
>
>
> Bill
>
>
> On 21 Feb 2013, at 19:03, amulya rattan <talk2amu...@gmail.com> wrote:
>
> > Dear All,
> >
> > We are currently evaluating Cassandra for an application involving strict
> SLAs(Service level agreements). We just need one column family with a long
> key and approximately 70-80 bytes row. We are not concerned about write
> performance but are primarily concerned about read. For our SLAs, a read of
> max 15-20 rows at once(using multi slice), should not take more than 4 ms.
> Till now, on a single node setup, using cassandra' stress tool, the numbers 
> are
> promising. But I am guessing that's because there is no network latency
> involved there and since we set memtable around 2gb(4 gb heap), we never
> had to get to Disk I/O.
> >
> > Assuming our nodes having >32GB RAM, a couple of questions regarding
> read:
> >
> > * To avoid disk I/Os, the best option we thought is to have data in memory.
> Is it a good idea to have memtable setup around 1/2 or 3/4 of heap size?
> Obviously flushing will take a lot of time but would that hurt that node's
> performance big time?
> >
> > * Cassandra stress tool only gives out average read latency. Is there a way
> to figure out max read-latency for a bunch of read operations?
> >
> > * How big a row cache can one have? Given that cassandra provides off-
> heap row caching, in a machine >32 gb RAM, would it be wise to have a >10
> gb row cache with 8 gb java heap? And how big should the corresponding key
> cache be then?
> >
> > Any response is appreciated.
> >
> > ~Amulya
> >


Best regards / Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.com
Phone: +370 5 212 3063
Fax: +370 5 261 0453

J. Jasinskio 16C,
LT-01112 Vilnius,
Lithuania



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

RE: Using Cassandra for read operations

Reply via email to