Re: How to retrieve snappy compressed data from Cassandra using Datastax?

2014-01-29 Thread Sylvain Lebresne
I believe you are being confusing by using both thrift and CQL3. If you haven't done so, you can try checking blog posts like http://www.datastax.com/dev/blog/thrift-to-cql3, http://www.datastax.com/dev/blog/cql3-for-cassandra-experts and maybe

Re: question about secondary index or not

2014-01-29 Thread Ondřej Černoš
Hi, we had a similar use case. Just do the filtering client-side, the #2 example performs horribly, secondary indexes on something dividing the set into two roughly the same size subsets just don't work. Give it a try on localhost with just a couple of records (150.000), you will see. regards,

GC taking a long time

2014-01-29 Thread Robert Wille
I read through the recent thread Cassandra mad GC, which seemed very similar to my situation, but didn¹t really help. Here is what I get from my logs when I grep for GCInspector. Note that this is the middle of the night on a dev server, so there should have been almost no load. INFO

Possibly losing data with corrupted SSTables

2014-01-29 Thread Francisco Nogueira Calmon Sobral
Dear experts, We are facing a annoying problem in our cluster. We have 9 amazon extra large linux nodes, running Cassandra 1.2.11. The short story is that after moving the data from one cluster to another, we've been unable to run 'nodetool repair'. It get stuck due to a

Re: Introducing farsandra: A different way to integration test with c*

2014-01-29 Thread Edward Capriolo
Farsandra 0.0.1 is in maven central. Added a couple features to allow customizing cassandra.yaml and cassandra env (control memory of forked instance), auto downloading of version specified. http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22farsandra%22 On Wednesday, January 22, 2014, Edward

cluster installer?

2014-01-29 Thread Peter Lin
Is anyone aware of a cluster installer for Cassandra? Granted it's not hard to untar the file, change cassandra.yaml and start the server, but seems like there should be a nice installer to make it easier. Anyone know if opscenter does that? peter

RE: cluster installer?

2014-01-29 Thread Romain HARDOUIN
OpsCenter provides cluster management features such creating a cluster and adding a node: http://www.datastax.com/documentation/opscenter/4.0/webhelp/index.html#opsc/online_help/opscClusterAdmin_c.html Otherwise you can use Chef, Puppet, Salt, Ansible etc. Cheers, Romain Peter Lin

Re: GC taking a long time

2014-01-29 Thread Robert Wille
Forget about what I said about there not being any load during the night. I forgot about my unit tests. They would have been running at this time and they run against this cluster. I also forgot to provide JVM information: java version 1.7.0_17 Java(TM) SE Runtime Environment (build

Re: question about secondary index or not

2014-01-29 Thread Mullen, Robert
Thanks for that info ondrej, I've never tested out secondary indexes as I've avoided them because of all the uncertainty around them, and your statement just adds to the uncertainty. Everything I had read said that secondary indexes were supposed to work well for columns with low cardinality, but

Nodetool cleanup on vnode cluster removes more data then wanted

2014-01-29 Thread Desimpel, Ignace
Got into a problem when testing a vnode setup. I'm using a byteordered partitioner, linux, code version 2.0.4, replication factor 1, 4 machine All goes ok until I run cleanup, and gets worse when adding / decommissioning nodes. In my opinion the problem can be found in the SSTableScanner::

Weird GC

2014-01-29 Thread Joel Samuelsson
Hi, We've been trying to figure out why we have so long and frequent stop-the-world GC even though we have basically no load. Today we got a log of a weird GC that I wonder if you have any theories of why it might have happened. A plot of our heap at the time, paired with the GC time from the

Re: Weird GC

2014-01-29 Thread Benedict Elliott Smith
It's possible the time attributed to GC is actually spent somewhere else; a multitude of tasks may occur during the same safepoint as a GC. We've seen some batch revoke of biased locks take a long time, for instance; *if* this is happening in your case, and we can track down which objects, I would

Re: Possibly losing data with corrupted SSTables

2014-01-29 Thread Rahul Menon
Francisco, the sstables with *-ib-* is something that was from a previous version of c*. The *-ib-* naming convention started at c* 1.2.1 but 1.2.10 onwards im sure it has the *-ic-* convention. You could try running a nodetool sstableupgrade which should ideally upgrade the sstables with the

Re: Possibly losing data with corrupted SSTables

2014-01-29 Thread Francisco Nogueira Calmon Sobral
Hi, Rahul. I've run nodetool upgradesstable only in the problematic CF. It throwed the following exception: Error occurred while upgrading the sstables for keyspace Sessions java.util.concurrent.ExecutionException: org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.IOException:

Intermittent long application pauses on nodes

2014-01-29 Thread Frank Ng
All, We've been having intermittent long application pauses (version 1.2.8) and not sure if it's a cassandra bug. During these pauses, there are dropped messages in the cassandra log file along with the node seeing other nodes as down. We've turned on gc logging and the following is an example

Re: Intermittent long application pauses on nodes

2014-01-29 Thread Shao-Chuan Wang
We had similar latency spikes when pending compactions can't keep it up or repair/streaming taking too much cycles. On Wed, Jan 29, 2014 at 10:07 AM, Frank Ng fnt...@gmail.com wrote: All, We've been having intermittent long application pauses (version 1.2.8) and not sure if it's a cassandra

Re: Intermittent long application pauses on nodes

2014-01-29 Thread Frank Ng
Thanks for the update. Our logs indicated that there were 0 pending for CompactionManager at that time. Also, there were no nodetool repairs running at that time. The log statements above state that the application had to stop to reach a safepoint. Yet, it doesn't say what is requesting the

Re: Intermittent long application pauses on nodes

2014-01-29 Thread Benedict Elliott Smith
Frank, The same advice for investigating holds: add the VM flags -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1 (you could put something above 1 there, to reduce the amount of logging, since a pause of 52s will be pretty obvious even if aggregated with lots of other safe

Re: Intermittent long application pauses on nodes

2014-01-29 Thread Frank Ng
Benedict, Thanks for the advice. I've tried turning on PrintSafepointStatistics. However, that info is only sent to the STDOUT console. The cassandra startup script closes the STDOUT when it finishes, so nothing is shown for safepoint statistics once it's done starting up. Do you know how to

Re: Intermittent long application pauses on nodes

2014-01-29 Thread Benedict Elliott Smith
Add some more flags: -XX:+UnlockDiagnosticVMOptions -XX:LogFile=${path} -XX:+LogVMOutput I never figured out what kills stdout for C*. It's a library we depend on, didn't try too hard to figure out which one. On 29 January 2014 21:07, Frank Ng fnt...@gmail.com wrote: Benedict, Thanks for the

Re: Nodetool cleanup on vnode cluster removes more data then wanted

2014-01-29 Thread Tyler Hobbs
Ignace, Thanks for reporting this. I've been able to reproduce the issue with a unit test, so I opened https://issues.apache.org/jira/browse/CASSANDRA-6638. I'm not 100% sure if your fix is the correct one, but I should be able to get it fixed quickly and figure out the full set of cases where a

Question about local reads with multiple data centers

2014-01-29 Thread Donald Smith
We have two datacenters, DC1 and DC2 in our test cluster. Our write process uses a connection string with just the two hosts in DC1. Our read process uses a connection string just with the two hosts in DC2. We use a PropertyFileSnitch and a property file that 'DC1':2, 'DC2':1 between data

Re: Nodetool cleanup on vnode cluster removes more data then wanted

2014-01-29 Thread Edward Capriolo
Is this only a ByteOrderPartitioner problem? On Wed, Jan 29, 2014 at 7:34 PM, Tyler Hobbs ty...@datastax.com wrote: Ignace, Thanks for reporting this. I've been able to reproduce the issue with a unit test, so I opened https://issues.apache.org/jira/browse/CASSANDRA-6638. I'm not 100%

cql IN clause question

2014-01-29 Thread Jimmy Lin
select * from mytable where mykey IN('xxx', 'yyy', 'zzz','111',222','333') is there a limit on how many item you can specify inside IN clause? CQL IN clause will help reduce the round trip traffic otherwise needed if use multiple select statement, correct? but how about the co-ordinate node that

Re: cql IN clause question

2014-01-29 Thread Edward Capriolo
Each IN is the equivalent of a thrift get_slice(). You are saving some overhead on round trips but if you have a schema design that calls for large in clauses your may not be designing your schema correctly. On Wed, Jan 29, 2014 at 11:41 PM, Jimmy Lin y2klyf+w...@gmail.com wrote: select * from

Restoring keyspace using snapshots

2014-01-29 Thread Senthil, Athinanthny X. -ND
Plan to backup and restore keyspace from PROD to PRE-PROD cluster which has same number of nodes. Keyspace will have few hundred millions of rows. We need to do this every other week. Which one of the below options most time-efficient and puts less stress on target cluster ? We want to finish