High read latency

2010-06-04 Thread Ma Xiao
we have a SupperCF which may have up to 1000 supper columns and 5 clumns for each supper column, the read latency may go up to 50ms (even higher), I think it's a long time to response, how to tune the storage config to optimize the performace? I read the wiki, ColumnIndexSizeInKB may help to

Re: High read latency

2010-06-04 Thread Sylvain Lebresne
As written in the third point of http://wiki.apache.org/cassandra/CassandraLimitations, right now, super columns are not indexed and deserialized fully when you access them. Another way to put it is, you'll want to user super columns with only a relatively small number of columns in them. Because

Re: High CPU Usage since 0.6.2

2010-06-04 Thread Lu Ming
I notice that: there are more than 100 CLOSE_WAIT incomming connections on storage port 7000 In my two cassandra node: 126 of 146 storage connections is CLOSE_WAIT 196 of 217 storage connections is CLOSE_WAIT Is it normal? -- From: Chris

Fatal exception in with compaction

2010-06-04 Thread casablinca126.com
hi , I get a fatal exception with my cassandra cluster: java.lang.NoClassDefFoundErrororg/apache/cassandra/db/CompactionManager$4 at org.apache.cassandra.db.CompactionManager.submitMajor(CompactionManager.java:156) at

Re: High CPU Usage since 0.6.2

2010-06-04 Thread Lu Ming
I do the Thread Dump on each cassandra node, and count the thread with call stack string at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:62)atorg.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.jav a:66) in thread-xxx then I find an

Re: Fatal exception in with compaction

2010-06-04 Thread casablinca126.com
hi, I have not used nodetool repair or nodetool compact . So how is MajorCompaction triggered? -- casablinca126.com 2010-06-04 - 发件人:casablinca126.com 发送日期:2010-06-04 18:05:11

Re: Cassandra training Jun 18 in SF

2010-06-04 Thread S Ahmed
Nice! Would it be possible to give more than 2 weeks notice for the following events? Preferrably a month, its not that easy to get off work etc. On Fri, Jun 4, 2010 at 4:22 AM, Oleg Anastasjev olega...@gmail.com wrote: Jonathan Ellis jbellis at gmail.com writes: This will be Riptano's

Re: High CPU Usage since 0.6.2

2010-06-04 Thread Gary Dusbabek
Chris, Can you get me a stack dump of one of the busy nodes (kill -3)? Gary On Thu, Jun 3, 2010 at 22:50, Chris Goffinet goffi...@digg.com wrote: We're seeing this as well. We were testing with a 40+ node cluster on the latest 0.6 branch from few days ago. -Chris On Jun 3, 2010, at 9:55

Re: Are 6..8 seconds to read 23.000 small rows - as it should be?

2010-06-04 Thread Jonathan Ellis
get_slice reads a single row. do you mean there are 23,000 columns, or are you running get_slice in a loop 23000 times? On Fri, Jun 4, 2010 at 4:59 AM, Per Olesen p...@trifork.com wrote: Are 6..8 seconds to read 23.000 small rows - as it should be? I have a quick question on what I think is

http://voltdb.com/ ?

2010-06-04 Thread Denis Haskin
Anybody looked at VoltDB? I haven't dug into it, but curious about it. dwh

Re: [***SPAM*** ] Re: question about class SlicePredicate

2010-06-04 Thread David Boxenhorn
It works for Random Partitioner only if you want to get all keys. 2010/6/4 Shuai Yuan yuansh...@supertool.net.cn It's documented that get_range_slice() supports all partitioner in 0.6 Kevin 原始信件 发件人: Olivier Mallassi omalla...@octo.com 收件人: user@cassandra.apache.org

Re: Are 6..8 seconds to read 23.000 small rows - as it should be?

2010-06-04 Thread Ben Browning
How many subcolumns are in each supercolumn and how large are the values? Your example shows 8 subcolumns, but I didn't know if that was the actual number. I've been able to read columns out of Cassandra at an order of magnitude higher than what you're seeing here but there are too many variables

Embedded usage

2010-06-04 Thread Sten Roger Sandvik
Hi. I have looked at cassandra before and now I'm revisiting the project :-) At the project I am working on we need a fast storage for blobs and lucene indexes that is available on each node in the cluster. Cassandra seems to fit very good for the blob storage and cassandra/lucandra for the

Re: Handling disk-full scenarios

2010-06-04 Thread Ian Soboroff
Story continued, in hopes this experience is useful to someone... I shut down the node, removed the huge file, restarted the node, and told everybody to repair. Two days later, AE stages are still running. Ian On Thu, Jun 3, 2010 at 2:21 AM, Jonathan Ellis jbel...@gmail.com wrote: this is

Seeds, autobootstrap nodes, and replication factor

2010-06-04 Thread Philip Stanhope
Here's the scenario: would like R = N where N is the number of nodes. Let's say 8. 1. Create first node, modify storage-conf.xml and change the Seed/ to be the ip of the node. Change replication factor to 8 for CF of interest. Start the puppy up. 2. Create 2nd node, modify storage-confg.xml

Re: Seeds, autobootstrap nodes, and replication factor

2010-06-04 Thread Benjamin Black
On Fri, Jun 4, 2010 at 10:36 AM, Philip Stanhope pstanh...@wimba.com wrote: Here's the scenario: would like R = N where N is the number of nodes. Let's say 8. 1. Create first node, modify storage-conf.xml and change the Seed/ to be the ip of the node. Change replication factor to 8 for CF

Re: Seeds, autobootstrap nodes, and replication factor

2010-06-04 Thread Philip Stanhope
Thanks on the correction about Keyspace versus ColumnFamily ... I knew that just mis-typed. I guess it should be stated (to be obvious) ... that when you are auto bootstrapping a node ... the seed better be alive. The scenario I'm dealing with is that it might not be (reasons for that are

Re: Seeds, autobootstrap nodes, and replication factor

2010-06-04 Thread Benjamin Black
On Fri, Jun 4, 2010 at 11:04 AM, Philip Stanhope pstanh...@wimba.com wrote: I am contemplating a situation where there may be 2N servers ... but only N online at any one time. But, for operational purposes, N+n (where n is 1 or 2), N may be occasionally greater than R. Then Cassandra is

Re: Seeds, autobootstrap nodes, and replication factor

2010-06-04 Thread Philip Stanhope
I guess I'm thick ... What would be the right choice? Our data demands have already been proven to scale beyond what RDB can handle for our purposes. We are quite pleased with Cassandra read/write/scale out. Just trying to understand the operational considerations. On Jun 4, 2010, at 2:11

Re: Are 6..8 seconds to read 23.000 small rows - as it should be?

2010-06-04 Thread Per Olesen
On Jun 4, 2010, at 5:19 PM, Ben Browning wrote: How many subcolumns are in each supercolumn and how large are the values? Your example shows 8 subcolumns, but I didn't know if that was the actual number. I've been able to read columns out of Cassandra at an order of magnitude higher than

Re: Seeds, autobootstrap nodes, and replication factor

2010-06-04 Thread Benjamin Black
On Fri, Jun 4, 2010 at 11:14 AM, Philip Stanhope pstanh...@wimba.com wrote: I guess I'm thick ... What would be the right choice? Our data demands have already been proven to scale beyond what RDB can handle for our purposes. We are quite pleased with Cassandra read/write/scale out. Just

Re: Embedded usage

2010-06-04 Thread Sten Roger Sandvik
2010/6/4 Ran Tavory ran...@gmail.com Cassandra expects a config file and does not expose an alternative API, for this file, that's correct. I think it's not hard to add such API but so far the demand for it didn't exist. I see that making a config api is not that hard. Will probably take a

Re: Are 6..8 seconds to read 23.000 small rows - as it should be?

2010-06-04 Thread Torsten Curdt
Yes, I know. And I might end up doing this in the end. I do though have pretty hard upper limits of how many rows I will end up with for each key, but anyways it might be a good idea none the less. Thanks for the advice on that one. You set count to Integer.MAX. Did you try with say 3?

Re: Expected wait while bootstrapping?

2010-06-04 Thread Aaron Lav
On Fri, Jun 04, 2010 at 12:35:51PM -0700, Gary Dusbabek wrote: Most of the streaming messages are DEBUG, so you'll have to amp up logging. I've upped logging on the bootstrapping node, and I realize that it's trying to assume load from two nodes. The other node (ie the one not mentioned in the

Re: Are 6..8 seconds to read 23.000 small rows - as it should be?

2010-06-04 Thread Mike Malone
Yes, I know. And I might end up doing this in the end. I do though have pretty hard upper limits of how many rows I will end up with for each key, but anyways it might be a good idea none the less. Thanks for the advice on that one. You set count to Integer.MAX. Did you try with say 3?

Re: Column or SuperColumn

2010-06-04 Thread Jonathan Ellis
if you have a relatively small, static set of subcolumns, that you read as a group, then using supercolumns is reasonable On Tue, Jun 1, 2010 at 7:33 PM, Peter Hsu pe...@motivecast.com wrote: I have a pretty simple data modeling question.  I don't know whether or not to use a CF or SCF in one

Row Time range

2010-06-04 Thread Nicholas Sun
Is there a mechanism to select a time range within a row range query? Is this planned? For example, return to me the last 10 post starting at 7:00pm yesterday? Nick

Conditional get

2010-06-04 Thread Lev Stesin
Hi, I am not sure how to implement multiget or slice_range based on a conditional predicate. For example what if I want to get only keys with containing certain columns. Thanks. -- Lev

Strage Read Perfoamnce 1xN column slice or N column slice

2010-06-04 Thread Arya Goudarzi
Hi Fellows, I have the following design for a system which holds basically key-value pairs (aka Columns) for each user (SuperColumn Key) in different namespaces (SuperColumnFamily row key). Like this: Namesapce-user-column_name = column_value; keyspaces: - name: NKVP

Re: Expected wait while bootstrapping?

2010-06-04 Thread Aaron Lav
On Fri, Jun 04, 2010 at 12:35:51PM -0700, Gary Dusbabek wrote: Most of the streaming messages are DEBUG, so you'll have to amp up logging. I upped the logging to DEBUG on the bootstrapping node and the nodes being bootstrapped from, and the bootstrap completed fine, so I'm not sure what was

Re: Row Time range

2010-06-04 Thread Benjamin Black
That's entirely up to you. If you make row keys that are time ordered and include the time as a prefix in the key, you just use get_range() as usual, start now, end 7pm yesterday, count of 10. On Fri, Jun 4, 2010 at 2:23 PM, Nicholas Sun nick@raytheon.com wrote: Is there a mechanism to

Performance Characteristics of CASSANDRA-16 (Memory Efficient Compactions)

2010-06-04 Thread Jeremy Davis
https://issues.apache.org/jira/browse/CASSANDRA-16 Can someone (Jonathan?) help me understand the performance characteristics of this patch? Specifically: If I have an open ended CF, and I keep inserting with ever increasing column names (for example current Time), will things generally work out

Re: Seeds, autobootstrap nodes, and replication factor

2010-06-04 Thread Jonathan Shook
If I may ask, why the need for frequent topology changes? On Fri, Jun 4, 2010 at 1:21 PM, Benjamin Black b...@b3k.us wrote: On Fri, Jun 4, 2010 at 11:14 AM, Philip Stanhope pstanh...@wimba.com wrote: I guess I'm thick ... What would be the right choice? Our data demands have already been

Re: Is there any way to detect when a node is down so I can failover more effectively?

2010-06-04 Thread Patricio Echagüe
Thanks Johathan On Wed, Jun 2, 2010 at 11:17 PM, Jonathan Ellis jbel...@gmail.com wrote: you're overcomplicating things. just connect to *a* node, and if it happens to be down, try a different one. nodes being down should be a rare event, not a normal condition. no need to optimize for