Re: How to force GC in Cassandra?

2010-03-13 Thread Weijun Li
://wiki.apache.org/cassandra/NodeProbe On Fri, Mar 12, 2010 at 12:40 PM, Weijun Li weiju...@gmail.com wrote: Suppose I insert a lot of new items but also delete a lot of new items daily, it will be ideal if I can force GC to happen during mid night (when traffic is low). Is there any way

How to force GC in Cassandra?

2010-03-12 Thread Weijun Li
Suppose I insert a lot of new items but also delete a lot of new items daily, it will be ideal if I can force GC to happen during mid night (when traffic is low). Is there any way to manually force GC to be executed? In this way I can add a cronjob to trigger gc in mid night. I tried nodetool and

Re: Strategy to delete/expire keys in cassandra

2010-03-10 Thread Weijun Li
-Original Message- From: Sylvain Lebresne [mailto:sylv...@yakaz.com] Sent: Thursday, February 25, 2010 2:23 AM To: Weijun Li Cc: cassandra-user@incubator.apache.org Subject: Re: Strategy to delete/expire keys in cassandra Hi, Should I just run command (in Cassandra 0.5 source folder

Re: Strategy to delete/expire keys in cassandra

2010-03-10 Thread Weijun Li
Never mind. Figured out I forgot to compile thrift :) Thanks, -Weijun On Wed, Mar 10, 2010 at 1:43 PM, Weijun Li weiju...@gmail.com wrote: Hi Sylvain, I applied your patch to 0.5 but it seems that it's not compilable: 1) column.getTtl() is no defined in RowMutation.java public static

RE: Strategy to delete/expire keys in cassandra

2010-02-26 Thread Weijun Li
: Sylvain Lebresne [mailto:sylv...@yakaz.com] Sent: Thursday, February 25, 2010 2:23 AM To: Weijun Li Cc: cassandra-user@incubator.apache.org Subject: Re: Strategy to delete/expire keys in cassandra Hi, Should I just run command (in Cassandra 0.5 source folder?) like: patch –p1 –i  0001-Add-new

RE: Strategy to delete/expire keys in cassandra

2010-02-24 Thread Weijun Li
in your ticket? Also what's your opinion on extending ExpiringColumn to expire a key completely? Otherwise it will be difficult to track what are expired or old rows in Cassandra. Thanks, -Weijun From: Weijun Li [mailto:weiju...@gmail.com] Sent: Tuesday, February 23, 2010 6:18 PM

Strategy to delete/expire keys in cassandra

2010-02-23 Thread Weijun Li
It seems that we are mostly talking about write and read keys into/from Cassandra cluster. I’m wondering how did you successfully deal with deleting/expiring keys in Cassandra? An typical example is you want to delete keys that haven’t been modified in certain time period (i.e., old keys).

Re: Strategy to delete/expire keys in cassandra

2010-02-23 Thread Weijun Li
Thanks for the answer. A dumb question: how did you apply the patch file to 0.5 source? The link you gave doesn't mention that the patch is for 0.5?? Also, this ExpiringColumn feature doesn't seem to expire key/row, meaning the number of keys will keep grow (even if you drop columns for them)

Re: Testing row cache feature in trunk: write should put record in cache

2010-02-19 Thread Weijun Li
cached, we would let the os block cache handle that without adding an extra layer. (0.6 uses mmap'd i/o by default on 64bit JVMs so this is very efficient.) On Fri, Feb 19, 2010 at 3:29 AM, Weijun Li weiju...@gmail.com wrote: The memory overhead issue is not directly related to GC because when

Unbalanced read latency among nodes in a cluster

2010-02-19 Thread Weijun Li
I setup a two cassandra clusters with 2 nodes each. Both use random partitioner. It's strange that for each cluster, one node has much shortter read latency than the other one This is the info of one of the cluster: Node A: read count 77302, data file 41GB, read latency 58180, io saturation 100%

Re: Testing row cache feature in trunk: write should put record in cache

2010-02-17 Thread Weijun Li
at 8:37 PM, Weijun Li weiju...@gmail.com wrote: Just tried to make quick change to enable it but it didn't work out :-( ColumnFamily cachedRow = cfs.getRawCachedRow(mutation.key()); // What I modified if( cachedRow == null

Re: Cassandra benchmark shows OK throughput but high read latency ( 100ms)?

2010-02-16 Thread Weijun Li
Dumped 50mil records into my 2-node cluster overnight, made sure that there's not many data files (around 30 only) per Martin's suggestion. The size of the data directory is 63GB. Now when I read records from the cluster the read latency is still ~44ms, --there's no write happening during the

Re: Cassandra benchmark shows OK throughput but high read latency ( 100ms)?

2010-02-16 Thread Weijun Li
, Feb 16, 2010 at 9:50 AM, Weijun Li weiju...@gmail.com wrote: Dumped 50mil records into my 2-node cluster overnight, made sure that there's not many data files (around 30 only) per Martin's suggestion. The size of the data directory is 63GB. Now when I read records from the cluster the read

Re: Cassandra benchmark shows OK throughput but high read latency ( 100ms)?

2010-02-16 Thread Weijun Li
the read latency? -Weijun On Tue, Feb 16, 2010 at 10:01 AM, Brandon Williams dri...@gmail.com wrote: On Tue, Feb 16, 2010 at 11:56 AM, Weijun Li weiju...@gmail.com wrote: One more thoughts about Martin's suggestion: is it possible to put the data files into multiple directories that are located

Re: Cassandra benchmark shows OK throughput but high read latency ( 100ms)?

2010-02-16 Thread Weijun Li
cache rows. You don't want to use up all of the memory on your box for those caches though: you'll want to leave at least 50% for your OS's disk cache, which will store the full row content. -Original Message- From: Weijun Li weiju...@gmail.com Sent: Tuesday, February 16, 2010 12:16pm

Testing row cache feature in trunk: write should put record in cache

2010-02-16 Thread Weijun Li
Just started to play with the row cache feature in trunk: it seems to be working fine so far except that for RowsCached parameter you need to specify number of rows rather than a percentage (e.g., 20% doesn't work). Thanks for this great feature that improves read latency dramatically so that disk

Re: Cassandra benchmark shows OK throughput but high read latency ( 100ms)?

2010-02-16 Thread Weijun Li
:15 PM, Weijun Li weiju...@gmail.com wrote: Still have high read latency with 50mil records in the 2-node cluster (replica 2). I restarted both nodes but read latency is still above 60ms and disk i/o saturation is high. Tried compact and repair but doesn't help much. When I reduced

Re: Testing row cache feature in trunk: write should put record in cache

2010-02-16 Thread Weijun Li
at 5:20 PM, Jonathan Ellis jbel...@gmail.com wrote: On Tue, Feb 16, 2010 at 7:17 PM, Jonathan Ellis jbel...@gmail.com wrote: On Tue, Feb 16, 2010 at 7:11 PM, Weijun Li weiju...@gmail.com wrote: Just started to play with the row cache feature in trunk: it seems to be working fine so far except

RE: Cassandra benchmark shows OK throughput but high read latency ( 100ms)?

2010-02-15 Thread Weijun Li
does iostats tell you? http://spyced.blogspot.com/2010/01/linux-performance-basics.html do you have a lot of pending compactions? (tpstats will tell you) have you increased KeysCachedFraction? On Sun, Feb 14, 2010 at 8:18 PM, Weijun Li weiju...@gmail.com wrote: Hello, I saw some Cassandra

Cassandra benchmark shows OK throughput but high read latency ( 100ms)?

2010-02-14 Thread Weijun Li
Hello, I saw some Cassandra benchmark reports mentioning read latency that is less than 50ms or even 30ms. But my benchmark with 0.5 doesn’t seem to support that. Here’s my settings: Nodes: 2 machines. 2x2.5GHZ Xeon Quad Core (thus 8 cores), 8GB RAM ReplicationFactor=2

RackAwareStrategy - add the third datacenter to live cluster with replication factor 3

2010-02-11 Thread Weijun Li
Hello, I have a testing cluster with: A (dc1), B (dc1), C(dc2), D(dc2). The replication factor is 2 so I assume each DC will have a complete copy of the data. Also I'm using PropertyFileEndPointSnitch with rack.properties for the dc and rack settings. So, what's the steps to add another

nodeprobe flush not implemented in 0.5?

2010-02-11 Thread Weijun Li
Hello, I tried to run nodeprobe flush but it display the usage info without doing anything? What are the list of supported command for nodeprobe? Thanks, -Weijun

Rebalance after adding new nodes

2010-02-11 Thread Weijun Li
When you add a new node, cassandra will pick the node that has the most data then split its token. In this case the data distribution among all nodes become uneven. What's the right strategy/steps to rebalance the node load after adding new nodes? Here's one example: I have a cluster of node A, B,

nodeprobe freezes when connecting to remote cassandra node

2010-02-09 Thread Weijun Li
Hello, got one more issue when I was trying to run nodeprobe to connect to a remote cassandra node, it freezed for a while then showed the following error. The jmxremote port 8080 is open, and I tried to change the port but it doesn't help. This command works properly if I run it in the same