RE: Cassandra benchmark shows OK throughput but high read latency (> 100ms)?

2010-02-16 Thread Dr . Martin Grabmüller
> The other problem is: if I keep mixed write and read (e.g, 8 > write threads > plus 7 read threads) against the 2-nodes cluster > continuously, the read > latency will go up gradually (along with the size of > Cassandra data file), > and at the end it will become ~40ms (up from ~20ms) even wit

Question about Token selection for order-preserving partitioner

2010-02-16 Thread Nguyễn Minh Kha
Hi, I read the wiki topic Operationsand I don't understant way to use Token selection for order-preserving partitioner (application-dependent). I want create blog comment use TimeUUIDType and order-preserving for range query, this cluster run in 3 nodes

Re: Question about Token selection for order-preserving partitioner

2010-02-16 Thread Wojciech Kaczmarek
Hi! 2010/2/16 Nguyễn Minh Kha > > I read the wiki topic > Operationsand I don't understant > way to use Token selection for order-preserving > partitioner (application-dependent). > I want create blog comment use TimeUUIDType and order-preserving fo

Re: Cassandra benchmark shows OK throughput but high read latency (> 100ms)?

2010-02-16 Thread Brandon Williams
On Tue, Feb 16, 2010 at 2:32 AM, Dr. Martin Grabmüller < martin.grabmuel...@eleven.de> wrote: > In my tests I have observed that good read latency depends on keeping > the number of data files low. In my current test setup, I have stored > 1.9 TB of data on a single node, which is in 21 data file

cassandra freezes

2010-02-16 Thread Boris Shulman
Hello, I'm running some benchmarks on 2 cassandra nodes each running on 8 cores machine with 16G RAM, 10G for Java heap. I've noticed that during benchmarks with numerous writes cassandra just freeze for several minutes (in those benchmarks I'm writing batches of 10 columns with 1K data each for ev

Re: Question about Token selection for order-preserving partitioner

2010-02-16 Thread Nguyễn Minh Kha
Hi! I think, my question is not clearly. I don't know to config these nodes for this cluster. I want to use TimeUUIDType and order-preserving partitioner for range query. How to config InitialToken and seeds for node01, node02, node03. I see the wiki topic Operations explain InitialToken for ord

Re: Question about Token selection for order-preserving partitioner

2010-02-16 Thread Wojciech Kaczmarek
Hi! My comment was supposed to mean that you don't use TimeUUIDType as a token because it's a possible column type, not a key type (at least is what I know from a wiki and my short experience, I didn't check the source). You've probably mistaken sorting of different columns within a row (which dep

Re: Nodeprobe Not Working Properly

2010-02-16 Thread Shahan Khan
I can ping to the other server using db1a instead of the host name.192.168.1.13 db1a::1 localhost ip6-localhost ip6-loopbackfe00::0 ip6-localnetff00::0 ip6-mcastprefixff02::1 ip6-allnodesff02::2 ip6-allroutersff02::3 ip6-allhosts# Auto-generated hostname. Please do not remove this comment.127.0.0

Re: Nodeprobe Not Working Properly

2010-02-16 Thread Brandon Williams
On Tue, Feb 16, 2010 at 11:08 AM, Shahan Khan wrote: > I can ping to the other server using db1a instead of the host name. > > By 'host name' I assume you mean IP address. > 192.168.1.13db1a > ::1 localhost ip6-localhost ip6-loopback > fe00::0 ip6-localnet > ff00::0 ip6-mcastprefix > ff

Re: Cassandra benchmark shows OK throughput but high read latency (> 100ms)?

2010-02-16 Thread Weijun Li
Dumped 50mil records into my 2-node cluster overnight, made sure that there's not many data files (around 30 only) per Martin's suggestion. The size of the data directory is 63GB. Now when I read records from the cluster the read latency is still ~44ms, --there's no write happening during the read.

Re: Cassandra benchmark shows OK throughput but high read latency (> 100ms)?

2010-02-16 Thread Weijun Li
One more thoughts about Martin's suggestion: is it possible to put the data files into multiple directories that are located in different physical disks? This should help to improve the i/o bottleneck issue. Has anybody tested the row-caching feature in trunk (shoot for 0.6?)? -Weijun On Tue, Fe

Re: Cassandra benchmark shows OK throughput but high read latency (> 100ms)?

2010-02-16 Thread Brandon Williams
On Tue, Feb 16, 2010 at 11:50 AM, Weijun Li wrote: > Dumped 50mil records into my 2-node cluster overnight, made sure that > there's not many data files (around 30 only) per Martin's suggestion. The > size of the data directory is 63GB. Now when I read records from the cluster > the read latency

Re: Cassandra benchmark shows OK throughput but high read latency (> 100ms)?

2010-02-16 Thread Brandon Williams
On Tue, Feb 16, 2010 at 11:56 AM, Weijun Li wrote: > One more thoughts about Martin's suggestion: is it possible to put the data > files into multiple directories that are located in different physical > disks? This should help to improve the i/o bottleneck issue. > > Yes, you can already do this

Re: Cassandra benchmark shows OK throughput but high read latency (> 100ms)?

2010-02-16 Thread Weijun Li
Thanks for for DataFileDirectory trick and I'll give a try. Just noticed the impact of number of data files: node A has 13 data files with read latency of 20ms and node B has 27 files with read latency of 60ms. After I ran "nodeprobe compact" on node B its read latency went up to 150ms. The read l

Re: Cassandra benchmark shows OK throughput but high read latency (> 100ms)?

2010-02-16 Thread Brandon Williams
On Tue, Feb 16, 2010 at 12:16 PM, Weijun Li wrote: > Thanks for for DataFileDirectory trick and I'll give a try. > > Just noticed the impact of number of data files: node A has 13 data files > with read latency of 20ms and node B has 27 files with read latency of 60ms. > After I ran "nodeprobe co

Re: cassandra freezes

2010-02-16 Thread Tatu Saloranta
On Tue, Feb 16, 2010 at 6:25 AM, Boris Shulman wrote: > Hello, I'm running some benchmarks on 2 cassandra nodes each running > on 8 cores machine with 16G RAM, 10G for Java heap. I've noticed that > during benchmarks with numerous writes cassandra just freeze for > several minutes (in those benchm

Re: Cassandra benchmark shows OK throughput but high read latency (> 100ms)?

2010-02-16 Thread Stu Hood
> After I ran "nodeprobe compact" on node B its read latency went up to 150ms. The compaction process can take a while to finish... in 0.5 you need to watch the logs to figure out when it has actually finished, and then you should start seeing the improvement in read latency. > Is there any way

Re: Cassandra benchmark shows OK throughput but high read latency (> 100ms)?

2010-02-16 Thread Weijun Li
Still have high read latency with 50mil records in the 2-node cluster (replica 2). I restarted both nodes but read latency is still above 60ms and disk i/o saturation is high. Tried compact and repair but doesn't help much. When I reduced the client threads from 15 to 5 it looks a lot better but th

Re: Question about Token selection for order-preserving partitioner

2010-02-16 Thread Nguyễn Minh Kha
Hi, Thank Wojciech, I use TimeUUIDType for CF Comments, I'm not use it for init token. On Tue, Feb 16, 2010 at 11:45 PM, Wojciech Kaczmarek wrote: > Hi! > > My comment was supposed to mean that you don't use TimeUUIDType as a token > because it's a possible column type, not a key type (at least

Testing row cache feature in trunk: write should put record in cache

2010-02-16 Thread Weijun Li
Just started to play with the row cache feature in trunk: it seems to be working fine so far except that for RowsCached parameter you need to specify number of rows rather than a percentage (e.g., "20%" doesn't work). Thanks for this great feature that improves read latency dramatically so that dis

Re: Testing row cache feature in trunk: write should put record in cache

2010-02-16 Thread Jonathan Ellis
On Tue, Feb 16, 2010 at 7:11 PM, Weijun Li wrote: > Just started to play with the row cache feature in trunk: it seems to be > working fine so far except that for RowsCached parameter you need to specify > number of rows rather than a percentage (e.g., "20%" doesn't work). 20% works, but it's 20%

Re: Cassandra benchmark shows OK throughput but high read latency (> 100ms)?

2010-02-16 Thread Jonathan Ellis
Have you tried increasing KeysCachedFraction? On Tue, Feb 16, 2010 at 6:15 PM, Weijun Li wrote: > Still have high read latency with 50mil records in the 2-node cluster > (replica 2). I restarted both nodes but read latency is still above 60ms and > disk i/o saturation is high. Tried compact and r

Re: Testing row cache feature in trunk: write should put record in cache

2010-02-16 Thread Jonathan Ellis
On Tue, Feb 16, 2010 at 7:17 PM, Jonathan Ellis wrote: > On Tue, Feb 16, 2010 at 7:11 PM, Weijun Li wrote: >> Just started to play with the row cache feature in trunk: it seems to be >> working fine so far except that for RowsCached parameter you need to specify >> number of rows rather than a pe

Re: Cassandra benchmark shows OK throughput but high read latency (> 100ms)?

2010-02-16 Thread Weijun Li
Yes my KeysCachedFraction is already 0.3 but it doesn't relief the disk i/o. I compacted the data to be a 60GB (took quite a while to finish and it increased latency as expected) one but doesn't help much either. If I set KCF to 1 (meaning to cache all sstable index), how much memory will it take

Re: Cassandra benchmark shows OK throughput but high read latency (> 100ms)?

2010-02-16 Thread Jonathan Ellis
On Tue, Feb 16, 2010 at 7:27 PM, Weijun Li wrote: > Yes my KeysCachedFraction is already 0.3 but it doesn't relief the disk i/o. > I compacted the data to be a 60GB (took quite a while to finish and it > increased latency as expected) one but doesn't help much either. > > If I set KCF to 1 (meanin

Re: Testing row cache feature in trunk: write should put record in cache

2010-02-16 Thread Weijun Li
Yes it will be nice if you can add a parameter in storage-conf.xml to enable write-through to row cache. There are many cases that require the new keys to be immediately available for read. In my case I'm thinking of caching 30-50% of all records in memory to reduce read latency. Thanks, -Weijun

Re: Cassandra benchmark shows OK throughput but high read latency (> 100ms)?

2010-02-16 Thread Weijun Li
I didn't know you use actual key instead its md5 (for random patitioner) in KCF. It's good point that I'll watch hit ratio of KCF to determine whether it needs to be increased. Thanks, -Weijun On Tue, Feb 16, 2010 at 5:34 PM, Jonathan Ellis wrote: > On Tue, Feb 16, 2010 at 7:27 PM, Weijun Li

Re: Testing row cache feature in trunk: write should put record in cache

2010-02-16 Thread Weijun Li
Just tried to make quick change to enable it but it didn't work out :-( ColumnFamily cachedRow = cfs.getRawCachedRow(mutation.key()); // What I modified if( cachedRow == null ) { cfs.cacheRow(mutation.key()); c

Re: Testing row cache feature in trunk: write should put record in cache

2010-02-16 Thread Jonathan Ellis
https://issues.apache.org/jira/secure/CreateIssue!default.jspa, but this is pretty low priority for me. On Tue, Feb 16, 2010 at 8:37 PM, Weijun Li wrote: > Just tried to make quick change to enable it but it didn't work out :-( > >    ColumnFamily cachedRow = cfs.getRawCachedRow(mutat

Re: Testing row cache feature in trunk: write should put record in cache

2010-02-16 Thread Jonathan Ellis
... tell you what, if you write the option-processing part in DatabaseDescriptor I will do the actual cache part. :) On Tue, Feb 16, 2010 at 11:07 PM, Jonathan Ellis wrote: > https://issues.apache.org/jira/secure/CreateIssue!default.jspa, but > this is pretty low priority for me. > > On Tue, Feb