Efficient bulk range deletions without compactions by dropping SSTables.

2014-05-12 Thread Kevin Burton
We have a log only data structure… everything is appended and nothing is ever updated. We should be totally fine with having lots of SSTables sitting on disk because even if we did a major compaction the data would still look the same. By 'lots' I mean maybe 1000 max. Maybe 1GB each. However, I

Re: How long are expired values actually returned?

2014-05-12 Thread Chris Lohfink
That is not expected. What client are you using and how are you setting the ttls? What version of Cassandra? --- Chris Lohfink On May 8, 2014, at 9:44 AM, Sebastian Schmidt wrote: > Hi, > > I'm using the TTL feature for my application. In my tests, when using a > TTL of 5, the inserted rows

Re: Disable reads during node rebuild

2014-05-12 Thread Robert Coli
On Mon, May 12, 2014 at 10:18 AM, Paulo Ricardo Motta Gomes < paulo.mo...@chaordicsystems.com> wrote: > Is there a way to disable reads from a node while performing rebuild from > another datacenter? I tried starting the node in write survery mode, but > the nodetool rebuild command does not work

Re: Schema disagreement errors

2014-05-12 Thread Laing, Michael
Upgrade to 2.0.7 fixed this for me. You can also try 'nodetool resetlocalschema' on disagreeing nodes. This worked temporarily for me in 2.0.6. ml On Mon, May 12, 2014 at 3:31 PM, Gaurav Sehgal wrote: > We have recently started seeing a lot of Schema Disagreement errors. We > are using Cassan

Schema disagreement errors

2014-05-12 Thread Gaurav Sehgal
We have recently started seeing a lot of Schema Disagreement errors. We are using Cassandra 2.0.6 with Oracle Java 1.7. I went through the Cassandra FAQ and followed the below steps: - nodetool disablethrift - nodetool disablegossip - nodetool drain - 'kill '. As per the docume

Hadoop InputFormat that supports multiple queries

2014-05-12 Thread Clint Kelly
Hi everyone, I couple of months ago I started working on a new Hadoop InputFormat that we needed for something at my work. It is in a semi-working state now so I thought I would post a link in case anyone is interested: https://github.com/wibiclint/cassandra2-hadoop2 At the time I started worki

Can Cassandra client programs use hostnames instead of IPs?

2014-05-12 Thread Huiliang Zhang
Hi, Cassandra returns ips of the nodes in the cassandra cluster for further communication between hadoop program and the casandra cluster. Is there a way to configure the cassandra cluster to return hostnames instead of ips? My cassandra cluster is on AWS and has no elastic ips which can be access

Re: How to rebalance a cluster?

2014-05-12 Thread Oleg Dulin
I keep asking same question it seems -- sign of insanity. Cassandra version 1.2, not using vnodes (legacy). On 2014-03-07 19:37:48 +, Robert Coli said: On Fri, Mar 7, 2014 at 6:00 AM, Oleg Dulin wrote: I have the following situation: 10.194.2.5    RAC1        Up     Normal  378.6 GB    

Disable reads during node rebuild

2014-05-12 Thread Paulo Ricardo Motta Gomes
Hello, I'm not able to replace a dead node using the ordinary procedure (boostrap+join), and would like to rebuild the replacement node from another DC. The problem is that if I start a node with auto_bootstrap=false to perform the rebuild, it automatically starts serving empty reads (CL=LOCAL_ONE

idempotent counters

2014-05-12 Thread Jabbar Azam
Hello, Do people use counters when they want to have idempotent operations in cassandra? I have a use case for using a counter to check for a count of objects in a partition. If the counter is more than some value then the data in the partition is moved into two different partitions. I can't work

Re: Storing log structured data in Cassandra without compactions for performance boost.

2014-05-12 Thread DuyHai Doan
Hello Kevin You can disable compaction by configuring the compaction options of your table as follow: compaction={'min_threshold': '0', 'class': 'SizeTieredCompactionStrategy', 'max_threshold': '0'} Regards Duy Hai DOAN On Wed, May 7, 2014 at 2:55 AM, Kevin Burton wrote: > I'm looking a

Re: How long are expired values actually returned?

2014-05-12 Thread Peter Reilly
You need to set grace period as well. Peter On Thu, May 8, 2014 at 8:44 AM, Sebastian Schmidt wrote: > Hi, > > I'm using the TTL feature for my application. In my tests, when using a > TTL of 5, the inserted rows are still returned after 7 seconds, and > after 70 seconds. Is this normal or am

Cassandra & MapReduce/Storm/ etc

2014-05-12 Thread Manoj Khangaonkar
Hi, Searching for Cassandra with MapReduce, I am finding that the search results are really dated -- from version 0.7 & 2010/2011. Is there a good blog/article that describes how using MapReduce on Cassandra table ? >From my naive understanding, Cassandra is all about partitioning. Querying is b

Re: Really need some advices on large data considerations

2014-05-12 Thread Aaron Morton
> We've learned that compaction strategy would be an important point cause > we've ran into 'no space' trouble because of the 'sized tiered' compaction > strategy. If you want to get the most out of the raw disk space LCS is the way to go, remember it uses approximately twice the disk IO. > F

Re: Cassandra & MapReduce/Storm/ etc

2014-05-12 Thread Aaron Morton
> Is there a good blog/article that describes how using MapReduce on Cassandra > table ? The best way to get into cassandra and hadoop is to play with Cassandra DSE. It’s free for development, costs for production, and is an easy way to learn about hadoop integration without having to worry abo

Re: Effect of number of keyspaces on write-throughput....

2014-05-12 Thread Aaron Morton
> On the homepage of libQtCassandra, its mentioned that switching between > keyspaces is costly when storing into Cassandra thereby affecting the write > throughput. Is this necessarily true for other libraries like pycassa and > hector as well? > > When using the thrift connection the keyspac

Re: Question about READS in a multi DC environment.

2014-05-12 Thread Aaron Morton
> > read_repair_chance=1.00 AND There’s your problem. When read repair is active for a read request the coordinator will over read to all UP replicas. Your client request will only block waiting for the one request (the data request), the rest of the repair will happen in the background.

Re: Question about READS in a multi DC environment.

2014-05-12 Thread DuyHai Doan
Ins't read repair supposed to be done asynchronously in background ? On Mon, May 12, 2014 at 2:07 AM, graham sanderson wrote: > You have a read_repair_chance of 1.0 which is probably why your query is > hitting all data centers. > > On May 11, 2014, at 3:44 PM, Mark Farnan wrote: > > > Im tryi