Re: CQL 'Where' clause ignores secondary index filter

2012-01-19 Thread Sylvain Lebresne
I think that qualify as a bug. We should either refuse the query if we don't know how to do this correctly or return a sensible result (i.e, no result in that case). Would you mind opening a ticket on https://issues.apache.org/jira/browse/CASSANDRA? -- Sylvain On Fri, Jan 20, 2012 at 6:39 AM, w

Re: ideal cluster size

2012-01-19 Thread Peter Schuller
> We're embarking on a project where we estimate we will need on the order > of 100 cassandra nodes. The data set is perfectly partitionable, meaning > we have no queries that need to have access to all the data at once. We > expect to run with RF=2 or =3. Is there some notion of ideal cluster > si

Re: CQL 'Where' clause ignores secondary index filter

2012-01-19 Thread vaibhav . s
Dear Aaron, Thanks for the information. Actually it's a normal query which works with SQL. I believe there will be some mechanism to do so in Cassandra, as first retrieving the records based on key and then checking for the column index later will be inefficient. Thanks again. Regards, V

ideal cluster size

2012-01-19 Thread Thorsten von Eicken
We're embarking on a project where we estimate we will need on the order of 100 cassandra nodes. The data set is perfectly partitionable, meaning we have no queries that need to have access to all the data at once. We expect to run with RF=2 or =3. Is there some notion of ideal cluster size? Or per

Re: cassandra hit a wall: Too many open files (98567!)

2012-01-19 Thread Thorsten von Eicken
Ah, that explains part of the problem indeed. The whole situation still doesn't make a lot of sense to me, unless the answer is that the default sstable size with level compaction is just no good for large datasets. I restarted cassandra a few hours ago and it had to open about 32k files at start-u

Re: Garbage collection freezes cassandra node

2012-01-19 Thread Peter Schuller
> On node "172.16.107.46", I see the following: > > 21:53:27.192+0100: 1335393.834: [GC 1335393.834: [ParNew (promotion failed): > 319468K->324959K(345024K), 0.1304456 secs]1335393.964: [CMS: > 6000844K->3298251K(8005248K), 10.8526193 secs] 6310427K->3298251K(8350272K), > [CMS Perm : 26355K->263

RE: cassandra 1.0.6 rpm

2012-01-19 Thread Shu Zhang
Thanks Philippe, I checked their docs. RPMs should be at http://rpm.datastax.com/community/ now, but 1.0.6 is not there either. Can someone at datastax please comment on this? Are you guys no longer packaging cassandra releases? From: Philippe [watche...@

Re: How to store unique visitors in cassandra

2012-01-19 Thread Milind Parikh
You might want to look at the code in countandra.org; regardless of whether you use it. It use a model of dynamic composite keys (although static composite keys would have worked as well). For the actual query,only one row is hit. This of course only works bc the data model is attuned for the query

Re: How to store unique visitors in cassandra

2012-01-19 Thread Tyler Hobbs
On Thu, Jan 19, 2012 at 8:25 AM, Alain RODRIGUEZ wrote: > > I'm still in the dark about how to get the number of unique visitors > between 2 dates (randomly chosen, because chosen by user) efficiently. > > I could easily count them per hour, day, week, month... But it's a bit > harder to give thi

Re: Unbalanced cluster with RandomPartitioner

2012-01-19 Thread Narendra Sharma
I believe you need to move the nodes on the ring. What was the load on the nodes before you added 5 new nodes? Its just that you are getting data in certain token range more than others. -Naren On Thu, Jan 19, 2012 at 3:22 AM, Marcel Steinbach wrote: > On 18.01.2012, at 02:19, Maki Watanabe wro

Re: Unbalanced cluster with RandomPartitioner

2012-01-19 Thread aaron morton
Load reported from node tool ring is the live load, which means SSTables that the server has open and will read from during a request. This will include tombstones, expired and over written data. nodetool ctstats also includes "dead" load, which is sstables that are in use but still on disk.

Re: CQL 'Where' clause ignores secondary index filter

2012-01-19 Thread aaron morton
It is working as expected. Because you have specified a KEY the query returns records that match that key(s), and it ignores the other clauses. Selecting rows follows one of three paths: * selects rows by key(s) * select rows by key range, i.e. rows after this key. * select rows by (secondary

Re: Incremental backups

2012-01-19 Thread aaron morton
mmm, they are not included in the snapshot they are probably not used. Have you dropped an index call 09partition on AttractionCheckins? In [52]: "".join(chr(int(x+y, 16)) for x,y in zip("3039706172746974696f6e"[0::2], "3039706172746974696f6e"[1::2])) Out[52]: '09partition' The simple thing to

Re: Garbage collection freezes cassandra node

2012-01-19 Thread Mohit Anchlia
What's the version of Java do you use? Can you try reducing NewSize and increasing Old generation? If you are on old version of Java I also recommend upgrading that version. On Thu, Jan 19, 2012 at 3:27 AM, Rene Kochen wrote: > Thanks for your comments. The application is indeed suffering from a

Re: How to store unique visitors in cassandra

2012-01-19 Thread Alain RODRIGUEZ
Thanks aaron, I already paid attention to these slides and I just looked at them again. I'm still in the dark about how to get the number of unique visitors between 2 dates (randomly chosen, because chosen by user) efficiently. I could easily count them per hour, day, week, month... But it's a bi

Re: nodetool ring question

2012-01-19 Thread R. Verlangen
I will have a look very soon and if I find something I'll let you know. Thank you in advance! 2012/1/19 aaron morton > Michael, Robin > > Let us know if the reported live load is increasing and diverging from the > on disk size. > > If it is can you check nodetool cfstats and find an example of

Re: Deploying Cassandra 1.0.7 on EC2 in minutes

2012-01-19 Thread Rustam Aliyev
Great, will try 0.7.1 when it's ready. (Bug I mentioned was already reported) On 19/01/2012 13:15, Andrei Savu wrote: On Wed, Jan 18, 2012 at 7:58 PM, Rustam Aliyev > wrote: Hi Andrei, As you know, we are using Whirr for ElasticInbox (https://github.com/el

Re: Deploying Cassandra 1.0.7 on EC2 in minutes

2012-01-19 Thread Andrei Savu
On Wed, Jan 18, 2012 at 7:58 PM, Rustam Aliyev wrote: > Hi Andrei, > > As you know, we are using Whirr for ElasticInbox ( > https://github.com/elasticinbox/whirr-elasticinbox). While testing we > encountered a few minor problems which I think could be improved. Note that > we were using 0.6 (the

Re: Unbalanced cluster with RandomPartitioner

2012-01-19 Thread Marcel Steinbach
2012/1/19 aaron morton : > If you have performed any token moves the data will not be deleted until you > run nodetool cleanup. We did that after adding nodes to the cluster. And then, the cluster wasn't balanced either. Also, does the "Load" really account for "dead" data, or is it just live data?

RE: Garbage collection freezes cassandra node

2012-01-19 Thread Rene Kochen
Thanks for your comments. The application is indeed suffering from a freezing Cassandra node. Queries are taking longer than 10 seconds at the moment of a full garbage collect. Here is an example from the logs. I have a three node cluster. At some point I see on a node the following log: 21:53

Re: Unbalanced cluster with RandomPartitioner

2012-01-19 Thread Marcel Steinbach
On 18.01.2012, at 02:19, Maki Watanabe wrote: > Are there any significant difference of number of sstables on each nodes? No, no significant difference there. Actually, node 8 is among those with more sstables but with the least load (20GB) On 17.01.2012, at 20:14, Jeremiah Jordan wrote: > Are yo

CQL 'Where' clause ignores secondary index filter

2012-01-19 Thread vaibhav . s
Hi, I've defined a column family 'Vaibhav' in which every row has few columns and its values. I've declared two column as secondary index so that I can filter the rows on the basis of those column values. Now whenever I execute a CQL with either only rowkey or column name in 'WHERE' clause

RE: Incremental backups

2012-01-19 Thread Michael Vaknine
When I upgraded I did it in 2 stages. Upgrade from 0.7.6 to 1.0.0 Run scrub on each node. Run repair on the cluster Upgrade to 1.0.3 Is it safe to run scrub again? Because it did not seem to help when I updated it to 1.0.0 Was there a bug in the scrub process in 1.0.0? What is the

Re: How to store unique visitors in cassandra

2012-01-19 Thread aaron morton
Some tips here from Matt Dennis on how to model time series data http://www.slideshare.net/mattdennis/cassandra-nyc-2011-data-modeling Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 19/01/2012, at 10:30 PM, Alain RODRIGUEZ wrote: > Hi than

Re: Incremental backups

2012-01-19 Thread aaron morton
Did you run a scrub as part of the upgrade process ? That will re-write all the sstables and remove the old ones. If not run a scrub now and it will re-write the data with a -hb- format in the file name. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelas

Re: Max records per node for a given secondary index value

2012-01-19 Thread aaron morton
Each node is stores the rows in it's token range, and those in the token ranges it is a replica for. So it will store roughly num_nodes / rf the rows. If you are approaching a situation where the node may store 2 billion rows, and so may have 2 billion entries in the secondary index row, you

Re: poor Memtable performance on column slices?

2012-01-19 Thread Sylvain Lebresne
On Thu, Jan 19, 2012 at 3:54 AM, Josep Blanquer wrote: > > > On Wed, Jan 18, 2012 at 12:44 PM, Jonathan Ellis wrote: >> >> On Wed, Jan 18, 2012 at 12:31 PM, Josep Blanquer >> wrote: >> > If I do a slice without a start (i.e., get me the first column)...it >> > seems >> > to fly. GET("K", :count

Re: How to store unique visitors in cassandra

2012-01-19 Thread Alain RODRIGUEZ
Hi thanks for your answer but I don't want to add more layer on top of Cassandra. I also have done all of my application without Countandra and I would like to continue this way. Furthermore there is a Cassandra modeling problem that I would like to solve, and not just hide. Alain 2012/1/18 Luca

Re: Using 5-6 bytes for cassandra timestamps vs 8…

2012-01-19 Thread Ertio Lew
It wont obviously matter in case your columns are fat but in several cases, (at least I could think of several cases) where you need to, for example, just store an integer column name & empty column value. Thus 12 bytes for the column where 8 bytes is just the overhead to store timestamps doesn't l