Strategy to delete/expire keys in cassandra

2010-02-23 Thread Weijun Li
It seems that we are mostly talking about write and read keys into/from Cassandra cluster. I’m wondering how did you successfully deal with deleting/expiring keys in Cassandra? An typical example is you want to delete keys that haven’t been modified in certain time period (i.e., old keys).

Re: problem about bootstrapping when used in huge node

2010-02-23 Thread Jonathan Ellis
On Tue, Feb 23, 2010 at 12:33 AM, Michael Lee mail.list.steel.men...@gmail.com wrote: (1) A cluster cannot be enlarge(add more node into cluster) if it already used more than half capacity: If every node has data more than it’s half capacity , the admin may not bootstrapping new node into

Hector - a Java Cassandra client

2010-02-23 Thread Ran Tavory
I've written a java library for cassandra I've been using internally, would love to get your feedback and hope you find it useful. Blog post: http://prettyprint.me/2010/02/23/hector-a-java-cassandra-client/ Source: http://github.com/rantav/hector High level features: o A high-level object

Re: Hector - a Java Cassandra client

2010-02-23 Thread Richard Grossman
Hi Ran, Is it support operation on super column ? Thanks On Tue, Feb 23, 2010 at 4:13 PM, Ran Tavory ran...@gmail.com wrote: I've written a java library for cassandra I've been using internally, would love to get your feedback and hope you find it useful. Blog post:

Re: Hector - a Java Cassandra client

2010-02-23 Thread Ran Tavory
it supports supercolumns, yes although I personally have only used regular columns so far (you can see the unit tests here http://github.com/rantav/hector/blob/master/src/test/java/me/prettyprint/cassandra/service/KeyspaceTest.java, search for super) On Tue, Feb 23, 2010 at 4:25 PM, Richard

Re: problem about bootstrapping when used in huge node

2010-02-23 Thread Brandon Williams
On Tue, Feb 23, 2010 at 7:31 AM, Jonathan Ellis jbel...@gmail.com wrote: (2) How to use node has 12 1TB disk?? You should use a better filesystem than ext3. :) We use xfs at rackspace. Also, don't use RAID5. Let Cassandra's replication handle disk failure scenarios instead, and supply

Re: problem about bootstrapping when used in huge node

2010-02-23 Thread Jonathan Ellis
For Michael, I think raid5 is not a bad choice: - if you're going to have multiple TB of data, using JBOD gives a pretty harsh limit to your compaction/anticompaction scenarios - he is concerned about his data set size, so so raid 1 or 10 wastes space - raid0 is potentially painful since you'd

reads are slow

2010-02-23 Thread kevin
to load 100 columns of a super column it takes over a second. how to improve this performance. i am using cassandra 0.5 version. output of nodeprobe info 29814395632524962303611017038378268216 Load : 9.18 GB Generation No: 1266945238 Uptime (seconds) : 638131 Heap Memory (MB) :

Re: reads are slow

2010-02-23 Thread kevin
i dont think so /dev/sdc1 is the commitlog drive, and /dev/sdd1 is the data directory. http://pastie.org/838943 On Tue, Feb 23, 2010 at 9:35 AM, Jonathan Ellis jbel...@gmail.com wrote: are you i/o bound? http://spyced.blogspot.com/2010/01/linux-performance-basics.html On Tue, Feb 23, 2010

Re: reads are slow

2010-02-23 Thread Brandon Williams
On Tue, Feb 23, 2010 at 11:33 AM, kevin kevincastigli...@gmail.com wrote: I have given 10GB RAM in cassandra.in.sh. -Xmx10G \ i have increased KeysCachedFraction to 0.04. i have two different drives for commitlog and data directoy. i have about 3 million rows. what can i do to

Re: Cassandra paging, gathering stats

2010-02-23 Thread Sonny Heer
Columns can easily be paginated via the 'start' and 'finish' parameters.  You can't jump to a random page, but you can provide next/previous behavior. Do you have an example of this? From a client, they can pass in the last key, which can then be used as the start with some predefined count.

Re: reads are slow

2010-02-23 Thread kevin
On Tue, Feb 23, 2010 at 9:51 AM, Brandon Williams dri...@gmail.com wrote: On Tue, Feb 23, 2010 at 11:33 AM, kevin kevincastigli...@gmail.comwrote: I have given 10GB RAM in cassandra.in.sh. -Xmx10G \ i have increased KeysCachedFraction to 0.04. i have two different drives for

Re: reads are slow

2010-02-23 Thread kevin
On Tue, Feb 23, 2010 at 10:06 AM, Jonathan Ellis jbel...@gmail.com wrote: the standard workaround is to change your data model to use non-super columns instead. is there any limit on the number of standard column families that i can set up on cassandra?

Re: reads are slow

2010-02-23 Thread Jonathan Ellis
On Tue, Feb 23, 2010 at 12:12 PM, kevin kevincastigli...@gmail.com wrote: On Tue, Feb 23, 2010 at 10:07 AM, Jonathan Ellis jbel...@gmail.com wrote: you enable row caching by upgrading to 0.6. :) where can i get 0.6 from? svn trunk? svn branches/cassandra-0.6 like I said, we're voting on a

Re: Cassandra paging, gathering stats

2010-02-23 Thread Brandon Williams
On Tue, Feb 23, 2010 at 11:54 AM, Sonny Heer sonnyh...@gmail.com wrote: Columns can easily be paginated via the 'start' and 'finish' parameters. You can't jump to a random page, but you can provide next/previous behavior. Do you have an example of this? From a client, they can pass in

Re: Cassandra paging, gathering stats

2010-02-23 Thread Jonathan Ellis
you'd actually use first column as start, empty finish, count=pagesize, and reversed=True, unless I'm misunderstanding something. On Tue, Feb 23, 2010 at 1:57 PM, Brandon Williams dri...@gmail.com wrote: On Tue, Feb 23, 2010 at 11:54 AM, Sonny Heer sonnyh...@gmail.com wrote: Columns can

Re: Cassandra paging, gathering stats

2010-02-23 Thread Brandon Williams
On Tue, Feb 23, 2010 at 2:28 PM, Jonathan Ellis jbel...@gmail.com wrote: you'd actually use first column as start, empty finish, count=pagesize, and reversed=True, unless I'm misunderstanding something. Oops, Jonathan is correct. -Brandon

Help for choice

2010-02-23 Thread Cemal
Hi all, My question will be about appropriate NoSQL solution rather than asking Cassandra related questions. In our case: - We have more than *denormalized* 4 million rows data and at the end of this year we are expecting 5-6 million rows - Every minute maybe more than 1000 rows can be

Re: reads are slow

2010-02-23 Thread kevin
On Tue, Feb 23, 2010 at 11:49 AM, Jonathan Ellis jbel...@gmail.com wrote: On Tue, Feb 23, 2010 at 12:12 PM, kevin kevincastigli...@gmail.com wrote: On Tue, Feb 23, 2010 at 10:07 AM, Jonathan Ellis jbel...@gmail.com wrote: you enable row caching by upgrading to 0.6. :) where can i get

Re: Help for choice

2010-02-23 Thread Tatu Saloranta
On Tue, Feb 23, 2010 at 3:54 PM, Chris Goffinet goffi...@digg.com wrote: MySQL Very funny! I assume this is related to MySQL's somewhat spotty record of actually conforming to SQL standard, right? ;-D (the NoSQL solution part) -+ Tatu +-

Re: Strategy to delete/expire keys in cassandra

2010-02-23 Thread Weijun Li
Thanks for the answer. A dumb question: how did you apply the patch file to 0.5 source? The link you gave doesn't mention that the patch is for 0.5?? Also, this ExpiringColumn feature doesn't seem to expire key/row, meaning the number of keys will keep grow (even if you drop columns for them)

Re: Help for choice

2010-02-23 Thread Cemal
I was not really expecting such an answer. :) Any other idea? On Wed, Feb 24, 2010 at 2:51 AM, Tatu Saloranta tsalora...@gmail.comwrote: Very funny! I assume this is related to MySQL's somewhat spotty record of actually conforming to SQL standard, right? ;-D (the NoSQL solution part)