Frequent secondary index sstable corruption

2014-06-10 Thread Jeremy Jongsma
I'm in the process of migrating data over to cassandra for several of our apps, and a few of the schemas use secondary indexes. Four times in the last couple months I've run into a corrupted sstable belonging to a secondary index, but have never seen this on any other sstables. When it happens, any

Re: Migration 1.2.14 to 2.0.8 causes "Tried to create duplicate hard link" at startup

2014-06-10 Thread Chris Burroughs
Were you able to solve or work around this problem? On 06/05/2014 11:47 AM, Tom van den Berge wrote: Hi, I'm trying to migrate a development cluster from 1.2.14 to 2.0.8. When starting up 2.0.8, I'm seeing the following error in the logs: INFO 17:40:25,405 Snapshotting drillster, Account to

Re: Frequent secondary index sstable corruption

2014-06-10 Thread Robert Coli
On Tue, Jun 10, 2014 at 7:31 AM, Jeremy Jongsma wrote: > I'm in the process of migrating data over to cassandra for several of our > apps, and a few of the schemas use secondary indexes. Four times in the > last couple months I've run into a corrupted sstable belonging to a > secondary index, but

Re: Frequent secondary index sstable corruption

2014-06-10 Thread Tyler Hobbs
If you've been dropping and recreating tables with the same name, you might be seeing this: https://issues.apache.org/jira/browse/CASSANDRA-6525 On Tue, Jun 10, 2014 at 12:19 PM, Robert Coli wrote: > On Tue, Jun 10, 2014 at 7:31 AM, Jeremy Jongsma > wrote: > >> I'm in the process of migrating

Re: Cannot query secondary index

2014-06-10 Thread Redmumba
Honestly, this has been by far my single biggest obstacle with Cassandra for time-based data--cleaning up the old data when the deletion criteria (i.e., date) isn't the primary key. I've asked about a few different approaches, but I haven't really seen any feasible options that can be implemented

Re: How to restart bootstrap after a failed streaming due to Broken Pipe (1.2.16)

2014-06-10 Thread Robert Coli
On Mon, Jun 9, 2014 at 10:43 PM, Colin Kuo wrote: > You can use "nodetool repair" instead. Repair is able to re-transmit the > data which belongs to new node. > Repair is not very likely to work in cases where bootstrap doesn't. @OP : you probably will have to tune your phi detector to be more

Adding and removing node procedures

2014-06-10 Thread ng
I just wanted to verify the procedures to add and remove nodes in my environment, please feel free to comments or advise. I have 3 node cluster N1, N2, N3 with Vnode configured as (256) on each node. All are in one data center. 1. Procedure to Change node hardware or replace to new node machine

StreamException while adding nodes

2014-06-10 Thread Philipp Potisk
Hi, I tried to double the size of an existing cluster from 4 to 8 nodes. First I added one node, which joined after 120min successfully. During that time there was no additional load on the cluster. Afterwards I started the other 3 new nodes after each other in order to join the cluster simultaneo

Re: Cannot query secondary index

2014-06-10 Thread Paulo Ricardo Motta Gomes
Our approach for this scenario is to run a hadoop job that periodically cleans old entries, but I admit it's far from ideal. Would be nice to have a more native way to perform these kinds of tasks. There's a legend about a compaction strategy that keeps only the N first entries of a partition key,

Large number of row keys in query kills cluster

2014-06-10 Thread Jeremy Jongsma
I ran an application today that attempted to fetch 20,000+ unique row keys in one query against a set of completely empty column families. On a 4-node cluster (EC2 m1.large instances) with the recommended memory settings (2 GB heap), every single node immediately ran out of memory and became unresp

Re: Consolidating records and TTL

2014-06-10 Thread Tyler Hobbs
On Thu, Jun 5, 2014 at 2:38 PM, Charlie Mason wrote: > > I can't do the initial account insert with a TTL as I can't guarantee when > a new value would come along and so replace this account record. However > when I insert the new account record, instead of deleting the old one could > I reinsert

Re: Large number of row keys in query kills cluster

2014-06-10 Thread DuyHai Doan
Hello Jeremy Basically what you are doing is to ask Cassandra to do a distributed full scan on all the partitions across the cluster, it's normal that the nodes are somehow stressed. How did you make the query? Are you using Thrift or CQL3 API? Please note that there is another way to get al

Re: Large number of row keys in query kills cluster

2014-06-10 Thread Jeremy Jongsma
I didn't explain clearly - I'm not requesting 2 unknown keys (resulting in a full scan), I'm requesting 2 specific rows by key. On Jun 10, 2014 6:02 PM, "DuyHai Doan" wrote: > Hello Jeremy > > Basically what you are doing is to ask Cassandra to do a distributed full > scan on all the part

Re: Large number of row keys in query kills cluster

2014-06-10 Thread Laing, Michael
Perhaps if you described both the schema and the query in more detail, we could help... e.g. did the query have an IN clause with 2 keys? Or is the key compound? More detail will help. On Tue, Jun 10, 2014 at 7:15 PM, Jeremy Jongsma wrote: > I didn't explain clearly - I'm not requesting 200

Re: StreamException while adding nodes

2014-06-10 Thread Robert Coli
On Tue, Jun 10, 2014 at 2:21 PM, Philipp Potisk wrote: > First I added one node, which joined after 120min successfully. During > that time there was no additional load on the cluster. Afterwards I started > the other 3 new nodes after each other in order to join the cluster > simultaneously. >

Re: VPC AWS

2014-06-10 Thread Ben Bromhead
Have a look at http://www.tinc-vpn.org/, mesh based and handles multiple gateways for the same network in a graceful manner (so you can run two gateways per region for HA). Also supports NAT traversal if you need to do public-private clusters. We are currently evaluating it for our managed Cas

Re: StreamException while adding nodes

2014-06-10 Thread Philipp Potisk
Hey Rob, thanks for pointing out the issue with simultaneous bootstraps. However, I am not sure if this applies in my case. As a matter of fact I did not start the nodes simultaneously - I waited about 10min until they were receiving streams from other nodes. So I guess the topology-changes were e

Setting TTL to entire row: UPDATE vs INSERT

2014-06-10 Thread Or Sher
Hi all, I encountered a strange phenomena (at least I believe it's strange) when trying to set a ttl for a whole row. When trying to set a ttl for a row using update statement and updating all values I'm getting kind of a "phantom cql row". When trying to do the same thing using an insert statemen