Re: counters + replication = awful performance?

2012-11-28 Thread Rob Coli
On Wed, Nov 28, 2012 at 7:15 AM, Edward Capriolo wrote: > I may be wrong but during a bootstrap hints can be silently discarded, if > the node they are destined for leaves the ring. Yeah : https://issues.apache.org/jira/browse/CASSANDRA-2434 > A user like this might benefit from DANGER counters.

Re: How to determine compaction bottlenecks

2012-11-28 Thread Derek Bromenshenkel
aaron morton thelastpickle.com> writes: > > > > I've been playing around with trying to figure out what is making compactions run so slow.Is this regular compaction or table upgrades ?  > I *think* upgrade tables is single threaded.  > Do you have some compaction logs lines that say "Compacte

Re: Generic questions over Cassandra 1.1/1.2

2012-11-28 Thread Bill de hÓra
> Compact storage is the schemaless of old. Right. That comes with the downside of picking one :) It does not seem the compact storage is the default choice for the the future. As well as interop with the thrift/cli world, I also find it hard to reason about row caching with CQL defined table

Re: Java high-level client

2012-11-28 Thread Michael Kjellman
CQL Datastax Java Driver for the win then... On Nov 28, 2012, at 12:25 PM, "Edward Capriolo" mailto:edlinuxg...@gmail.com>> wrote: Astyanax is a hector fork. You can see many of the hector' authors comments still in the astyanax code. There is some nice stuff in there but (IMHO) I do not see t

Re: Java high-level client

2012-11-28 Thread David Schairer
Well, not really. Astyanax ('astu-wanax' in mycenaean greek, 'lord of the city') has his brains dashed out against the walls of troy by Neoptolemus, son of Achilles. So the suck was universal. --DRS, possibly the only trained classicist using big cassandra databases :) On Nov 28, 2012, at 1

Re: Java high-level client

2012-11-28 Thread Edward Capriolo
Astyanax is a hector fork. You can see many of the hector' authors comments still in the astyanax code. There is some nice stuff in there but (IMHO) I do not see the fork as necessary. It has split up the community a bit, as there are now 3 high level Java clients. I would advice follow Josh's adv

Re: Java high-level client

2012-11-28 Thread Wei Zhu
Astyanax was the son of Hector who was Cassandra's brother in greek mythology. So son is doing better than the father:) -Wei From: Michael Kjellman To: "user@cassandra.apache.org" Sent: Wednesday, November 28, 2012 11:51 AM Subject: Re: Java high-level clie

Re: Java high-level client

2012-11-28 Thread Michael Kjellman
Lots of example code, nice api, good performance as the first things that come to mind why I like Astyanax better than Hector From: Andrey Ilinykh mailto:ailin...@gmail.com>> Reply-To: "user@cassandra.apache.org" mailto:user@cassandra.apache.org>> Date: Wednesda

Re: counters + replication = awful performance?

2012-11-28 Thread Edward Capriolo
Just for reference HBase's counters also do a local read. I am not saying they work better/worse/faster/slower but I would not suspect any system that reads on increment to me significantly faster then what Cassandra does. Just saying your counter throughput is read bound, this is not unique to C*

Re: Java high-level client

2012-11-28 Thread Andrey Ilinykh
First at all, it is backed by Netflix. They used it production for long time, so it is pretty solid. Also they have nice tool (Priam) which makes cassandra cloud (AWS) friendly. This is important for us. Andrey On Wed, Nov 28, 2012 at 11:53 AM, Wei Zhu wrote: > We are using Hector now. What is

Re: counters + replication = awful performance?

2012-11-28 Thread Sergey Olefir
Well, those are sad news then. I don't think I can consider 20k increments per second for a two node cluster (with RF=2) a reasonable performance (cost vs. benefit). I might have to look into other storage solutions or perhaps experiment with duplicate clusters with RF=1 or replicate_on_write=fals

Re: How to query secondary indexes

2012-11-28 Thread Blake Eggleston
You're going to have a problem doing this in a single query because you're asking cassandra to select a non-contiguous set of rows. Also, to my knowledge, you can only use non equal operators on clustering keys. The best solution I could come up with would be to define you table like so: CREATE TA

How to query secondary indexes

2012-11-28 Thread Oren Karmi
Hi, According to the documentation on Indexes ( http://www.datastax.com/docs/1.1/ddl/indexes ), in order to use WHERE on a column which is not part of my key, I must define a secondary index on it. However, I can only use equality comparison on it but I wish to use other comparisons methods like g

Re: Java high-level client

2012-11-28 Thread Wei Zhu
We are using Hector now. What is the major advantage of astyanax over Hector? Thanks. -Wei From: Andrey Ilinykh To: user@cassandra.apache.org Sent: Wednesday, November 28, 2012 9:37 AM Subject: Re: Java high-level client +1 On Tue, Nov 27, 2012 at 10:10

Re: Java high-level client

2012-11-28 Thread Andrey Ilinykh
+1 On Tue, Nov 27, 2012 at 10:10 AM, Michael Kjellman wrote: > Netflix has a great client > > https://github.com/Netflix/astyanax > >

Re: outOfMemory error

2012-11-28 Thread Bryan Talbot
Well, asking for 500MB of data at once for a server with such modest specs is asking for troubles. Here are my suggestions. Disable the 1 GB row cache Consider allocating that memory for the java heap "Xms2500m Xmx2500m" Don't fetch all the columns at once -- page through them a slice at a time I

Re: counters + replication = awful performance?

2012-11-28 Thread Edward Capriolo
I may be wrong but during a bootstrap hints can be silently discarded, if the node they are destined for leaves the ring. There are a large number of people using counters for 5 minute "real-time" statistics. On the back end they use ETL based reporting to compute the true stats at a hourly or dai

Re: need some help with row cache

2012-11-28 Thread Yiming Sun
Does replica placement play a role in row cache hits? I happen to notice that the 3 nodes on "rack 2" are the ones with no recent hit rates, even when I specify only one node from rack2 as the host to Hector. The cluster uses PropertyFileSnitch, and the nodes are alternating between rac1 and rac2

Re: need some help with row cache

2012-11-28 Thread Yiming Sun
Thanks guys. However, after I ran the client code several times (same set of 5000 entries), still 2 of the 6 nodes show 0 hits on row cache, despite each node has 1GB capacity for row cache and the caches are full. Since I always request the same entries over and over again, shouldn't there be

Re: Upgrade

2012-11-28 Thread Everton Lima
Yes. java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:796) at org.apache.cassandra.thrift.ThriftSessionManager.currentSession(ThriftSessionManager.java:53) at org.apache.cassandra.thrift.CassandraServer.state(CassandraServer.java:88)

Data backup and restore

2012-11-28 Thread Adeel Akbar
Dear All, I have Cassandra 1.1.4 cluster with 2 nodes. I need to take backup and restore on staging for testing purpose. I have taken snapshot with below mentioned command but It created snapshot on every Keyspace's column family. Is there any other way to take backup and restore quick. /opt

Re: Other problem in update

2012-11-28 Thread Everton Lima
The problens was that my unit tests are not cleaning up their data directory and there is some corrupt data in there. The problem was fixed by del the directory manualy. Thanks 2012/11/27 Tupshin Harper > Unless I'm misreading the git history, the stack trace you referenced > isn't from 1.1.2.

Re: need some help with row cache

2012-11-28 Thread Bryan Talbot
The row cache itself is global and the size is set with row_cache_size_in_mb. It must be enabled per CF using the proper settings. CQL3 isn't complete yet in C* 1.1 so if the cache settings aren't shown there, then you'll probably need to use cassandra-cli. -Bryan On Tue, Nov 27, 2012 at 10:41

Re: counters + replication = awful performance?

2012-11-28 Thread Rob Coli
On Tue, Nov 27, 2012 at 3:21 PM, Edward Capriolo wrote: > I mispoke really. It is not dangerous you just have to understand what it > means. this jira discusses it. > > https://issues.apache.org/jira/browse/CASSANDRA-3868 Per Sylvain on the referenced ticket : " I don't disagree about the effici

Re: counters + replication = awful performance?

2012-11-28 Thread Robin Verlangen
Not sure whether it's an option for you, but you might consider to do some in-memory aggregation of counter values and flushing only once every X updates / seconds. This will decrease both load, latency and throughput. However this is not possible in every single use case. Best regards, Robin Ver

Re: counters + replication = awful performance?

2012-11-28 Thread Sylvain Lebresne
Counters replication works in different ways than the one of "normal" writes. Namely, a counter update is written to a first replica, then a read is perform and the result of that is replicated to the other nodes. With RF=1, since there is only one replica no read is involved but in a way it's a de