Re: SSTable Index and Metadata - are they cached in RAM?

2012-08-17 Thread Maciej Miklas
Great articles, I did not find those before ! * SSTable Index - yes I mean column Index. *I would like to understand, how many disk seeks might be required to find column in single SSTable. I am assuming positive bloom filter on row key. Now Cassandra needs to find out whenever given SSTable cont

Re: nodetool repair uses insane amount of disk space

2012-08-17 Thread aaron morton
I would take a look at the replication: whats the RF per DC and what does nodetool ring say. It's hard (as in no recommended) to get NTS with rack allocation working correctly. Without know much more I would try to understand what the topology is and if it can be simplified. >> Additionally, t

Re: Cassandra 1.0 row deletion

2012-08-17 Thread aaron morton
> If you use the remove function to delete an entire row, is that an atomic > operation? Yes. Row level deletes are atomic. cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 17/08/2012, at 3:39 PM, Derek Williams wrote: > On Thu, Aug 16,

Re: SSTable Index and Metadata - are they cached in RAM?

2012-08-17 Thread aaron morton
> 2) Rad from disk all row keys, in order to find one (binary search) No. At startup cassandra samples the -index.db component every index_interval keys. At worst index_interval keys must be read from disk. > As I understand, in the worst case, we can have three disk seeks (2, 4, 6) > pro SSTa

Re: Omitting empty columns from CQL SELECT

2012-08-17 Thread aaron morton
If you specify the columns by name in the select clause the query returns them because they should be projected in the result set. Can you use a column slice instead ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 17/08/2012, at 11:09 A

Understanding UnavailableException

2012-08-17 Thread Mohit Agarwal
Hi guys, I am trying to understand what happens when an UnavailableException is thrown. a) Suppose we are doing a ConsistencyLevel.ALL write on a 3 node cluster. My understanding is that if one of the nodes is down and the coordinator node is aware of that(through gossip), then it will respond to

Re: Omitting empty columns from CQL SELECT

2012-08-17 Thread Mat Brown
Hi Aaron, Thanks for the answer. That makes sense and I can see it as a formal reason for returning empty columns, but as a practical matter, is there a situation in which that behavior would be useful? Unfortunately a column slice won't do the trick -- the columns we're looking for at any given

What is the ideal server-side technology stack to use with Cassandra?

2012-08-17 Thread Andy Ballingall TF
Hi, I've been running a number of tests with Cassandra using a couple of PHP drivers (namely PHPCassa (https://github.com/thobbs/phpcassa/) and PDO-cassandra (http://code.google.com/a/apache-extras.org/p/cassandra-pdo/), and the experience hasn't been great, mainly because I can't try out the CQL3

Re: What is the ideal server-side technology stack to use with Cassandra?

2012-08-17 Thread Tim Wintle
On Fri, 2012-08-17 at 11:09 +0100, Andy Ballingall TF wrote: > So my question is - if you were to build a new scalable project from > scratch tomorrow sitting on top of Cassandra, which technologies would > you select to serve HTTP requests to ensure you get: > > a) The best support from the cassa

Re: Understanding UnavailableException

2012-08-17 Thread Maciej Miklas
UnavailableException is bit tricky. It means, that not all replicas required by CL received update. Actually you do not know, whenever update was stored or not, and actually what went wrong. This is the case, why writing with CL.ALL might get problematic. It is enough, that only one replica is off

Re: indexing question related to playOrm on github

2012-08-17 Thread Hiller, Dean
I am not sure what you mean by play with the timestamp. I think this works without playing with the timestamp(thanks for you help as it got me here). 1. On a scan I hit 2. I end up looking up the pk 3. I compare the value in the row with the indexed value "mike" but I see the row with th

Re: What is the ideal server-side technology stack to use with Cassandra?

2012-08-17 Thread Edward Capriolo
The best stack is the THC stack. :) Tomcat Hadoop Cassandra :) On Fri, Aug 17, 2012 at 6:09 AM, Andy Ballingall TF wrote: > Hi, > > I've been running a number of tests with Cassandra using a couple of > PHP drivers (namely PHPCassa (https://github.com/thobbs/phpcassa/) and > PDO-cassandra (http:

Re: Understanding UnavailableException

2012-08-17 Thread Mohit Agarwal
Does this mean that the coordinator sends requests to all nodes, even when it knows that sufficient number of nodes are not available, via gossip? On Fri, Aug 17, 2012 at 4:49 PM, Maciej Miklas wrote: > UnavailableException is bit tricky. It means, that not all replicas > required by CL receive

Re: Opscenter 2.1 vs 1.3

2012-08-17 Thread Nick Bailey
Robin, Are you talking about total writes to the cluster, writes to a specific column family, or something else? There has been some changes to OpsCenters metric collection/storage system but nothing that should cause something like that. Also its possible the number of writes to the OpsCenter k

Re: Understanding UnavailableException

2012-08-17 Thread Nick Bailey
This blog post should help: http://www.datastax.com/dev/blog/how-cassandra-deals-with-replica-failure But to answer your question: >> UnavailableException is bit tricky. It means, that not all replicas >> required by CL received update. Actually you do not know, whenever update >> was stored or

Re: wild card on query

2012-08-17 Thread Swathi Vikas
Thank you very much Aaron. Information you provided is very helpful.   Have a great Weekend!!! swat.vikas   From: aaron morton To: user@cassandra.apache.org Sent: Thursday, August 16, 2012 6:29 PM Subject: Re: wild card on query > I want to retrieve all the pho

Re: C++ Bulk loader and Result set streaming.

2012-08-17 Thread Swathi Vikas
1) For now i am using sstableloader. I think, some time later i will write some code using RPC.   2) Yes, I looked into many blogs and found information that i need to use last index to retrieve next 100 rows. I was trying to save some time if some one has already done this kind of streaming. I 

Re: nodetool repair uses insane amount of disk space

2012-08-17 Thread Jim Cistaro
We see similar issues with some of the repairs at Netflix. Regarding the growth in payload… we see similar symptoms where nodes can double or triple size. Part of this may be because the repair may deal in large chunks for comparisons. This means that even if there is one byte of entropy, you

Re: What is the ideal server-side technology stack to use with Cassandra?

2012-08-17 Thread Aaron Turner
My stack: Java + JRuby + Rails + Torquebox I'm using the Hector client (arguably the most mature out there) and JRuby+RoR+Torquebox gives me a great development platform which really scales (full native thread support for example) and is extremely powerful. Honestly I expect, all my future RoR a

Re: Understanding UnavailableException

2012-08-17 Thread Mohit Agarwal
Thanks Nick for your answers. The blog post is very well written and was much needed i guess. On Fri, Aug 17, 2012 at 8:30 PM, Nick Bailey wrote: > This blog post should help: > > http://www.datastax.com/dev/blog/how-cassandra-deals-with-replica-failure > > But to answer your question: > > >> Un

Re: Understanding UnavailableException

2012-08-17 Thread Russell Haering
On Fri, Aug 17, 2012 at 8:00 AM, Nick Bailey wrote: > This is actually incorrect. If you get an UnavailableException, the > write was rejected by the coordinator and was not written anywhere. Last time I checked, this was not true for batch writes. The row mutations were started sequentially (ie,

Re: Why the StageManager thread pools have 60 seconds keepalive time?

2012-08-17 Thread Guillermo Winkler
Aaron, thanks for your answer. I'm actually tracking a problem where mutations get dropped and cfstats show no activity whatsoever, I have 100 threads for the mutation pool, no running or pending tasks, but some mutations get dropped none the less. I'm thinking about some scheduling problems but

Re: composite table with cassandra without using cql3?

2012-08-17 Thread Ben Frank
Hi Dean, I'm interested in this too, but I get a 404 with the link below, looks like I can't see your nosqlORM project. -Ben On Thu, Aug 2, 2012 at 9:04 AM, Hiller, Dean wrote: > For how to do it with astyanax, you can see here... > > Lines 310 and 335 > > https://github.com/deanhiller/nosql

Re: nodetool repair uses insane amount of disk space

2012-08-17 Thread Peter Schuller
> How come a node would consume 5x its normal data size during the repair > process? https://issues.apache.org/jira/browse/CASSANDRA-2699 It's likely a variation based on how out of synch you happen to be, and whether you have a neighbor that's also been repaired and bloated up already. > My set