Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Edward Capriolo
For what it is worth you schema is simple and uses compact storage. Thus you really dont need anything in cassandra 2.0 as far as i can tell. You might be happier with a stable release like 1.2.something and just hector or astyanax. You are really dealing with many issues you should not have to

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Edward Capriolo
Dont worry there will be plenty of time to upgrade to 2.0 or 2.1 later. It is an easy upgrade path an you will likely do it 2-4 tmes a year. Dont chose the latest and gteatest now thnking that you are future proofing. In reality you are volunteering as a beta tester. On Thursday, February 20,

Re: High CPU load on one node in the cluster

2014-02-20 Thread Edward Capriolo
Upgrade from 2.0.3. There are several bugs, On Wednesday, February 19, 2014, Yogi Nerella ynerella...@gmail.com wrote: You should start your Cassandra daemon with -verbose:gc (please check syntax) and then run it in foreground, as Cassandra closes the standard out) Please see other emails in

Re: paging state will not work

2014-02-20 Thread Edward Capriolo
I would try a fetch size other then 1. Cassandras slices are start inclusive so maybe that is a bug. On Tuesday, February 18, 2014, Katsutoshi nagapad.0...@gmail.com wrote: Hi. I am using Cassandra 2.0.5 version. If null is explicitly set to a column, paging_state will not work. My test

Re: Intermittent long application pauses on nodes

2014-02-20 Thread Joel Samuelsson
Hi Frank, We got a (quite) long GC pause today on 2.0.5: INFO [ScheduledTasks:1] 2014-02-20 13:51:14,528 GCInspector.java (line 116) GC for ParNew: 1627 ms for 1 collections, 425562984 used; max is 4253024256 INFO [ScheduledTasks:1] 2014-02-20 13:51:14,542 GCInspector.java (line 116) GC for

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Sylvain Lebresne
On Thu, Feb 20, 2014 at 2:16 PM, Edward Capriolo edlinuxg...@gmail.comwrote: For what it is worth you schema is simple and uses compact storage. Thus you really dont need anything in cassandra 2.0 as far as i can tell. You might be happier with a stable release like 1.2.something and just

Re: paging state will not work

2014-02-20 Thread Sylvain Lebresne
That does sound like a bug. Would you mind opening a JIRA ( https://issues.apache.org/jira/browse/CASSANDRA) ticket for it? On Thu, Feb 20, 2014 at 3:06 PM, Edward Capriolo edlinuxg...@gmail.comwrote: I would try a fetch size other then 1. Cassandras slices are start inclusive so maybe that

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Peter Lin
I disagree with the sentiment that thrift is not worth the trouble. CQL and all SQL inspired dialects limit one's ability to use arbitrary typed data in dynamic columns. With thrift it's easy and straight forward. With CQL there is no way to tell Cassandra the type of the name and value for a

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Mohit Anchlia
+1 I like hector client that uses thrift interface and exposes APIs that is similar to how Cassandra physically stores the values. On Thu, Feb 20, 2014 at 9:26 AM, Peter Lin wool...@gmail.com wrote: I disagree with the sentiment that thrift is not worth the trouble. CQL and all SQL inspired

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Sylvain Lebresne
On Thu, Feb 20, 2014 at 6:26 PM, Peter Lin wool...@gmail.com wrote: I disagree with the sentiment that thrift is not worth the trouble. Way to quote only part of my sentence and get mental on it. My full sentence was it's probably not worth the trouble to start with thrift if you're gonna use

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Benedict Elliott Smith
Cassandra will throw an exception indicating the type is different than the default type. If you want untyped data, store blobs. Or store in a different column (they're free when empty, after all). Type safety is considered a good thing by many. On 20 February 2014 17:26, Peter Lin

Re: paging state will not work

2014-02-20 Thread Edward Capriolo
Cassandra has no null. So in this context setting a column to null or updating null is a delete. I think. I remember debating the semantics of null once. On Tuesday, February 18, 2014, Katsutoshi nagapad.0...@gmail.com wrote: Hi. I am using Cassandra 2.0.5 version. If null is explicitly set to

C-driver to be used with nginx?

2014-02-20 Thread Jan Algermissen
Hi, does anyone know of a C-driver that can be / has been used with nginx? I am afraid that the C++ drivers[1] threading and connection pooling approach interferes with nginx's threading model. Doe anyone have any ideas? Jan [1] https://github.com/datastax/cpp-driver

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Edward Capriolo
The only thing you really can not do CQL3 loses some of the concept of CQL2 metadata, namedly the default validation and then column specific validation. In cassandra-cql we can say (butchering the syntax) create column family x DEFAULT_VALIDATOR = UTF8Type columns named y are int columns named

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Peter Lin
my apologies Sylvain, I didn't mean to misquote you. I still feel that even if someone is only going to use CQL, it is worth it to learn thrift. In the interest of discussion, I looked at both jira tickets and I don't see how that makes it so a developer can specify the name and value type for a

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread DuyHai Doan
Developers can use what ever type they want for the name or value in a dynamic column and the framework will handle it appropriately. What do you mean by dynamic column ? If you want to be able to insert an arbitrary number of columns in one physical row, CQL3 clustering is there and does pretty

How do you remote backup your cassandra nodes ?

2014-02-20 Thread user 01
What is your strategy/tools set to backup your Cassandra nodes, apart from from cluster replication/ snapshots within cluster?

Exception while iterating over large data

2014-02-20 Thread ankit tyagi
Hello guys, I was going through http://www.datastax.com/dev/blog/client-side-improvements-in-cassandra-2-0, and it is mentioned that automatically pagination is taken care of. I am using below code to iterate over large data for particular primary key. Statement stmt = new SimpleStatement(SELECT

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Peter Lin
Hi Duyhai, yes, I am talking about mixing static and dynamic columns in a single column family. Let me give you an example from retail. Say you're amazon and you sell over 10K different products. How do you store all those products with all the different properties like color, size, dimensions,

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread DuyHai Doan
Ok I see what you mean Peter. After reading CASSANDRA-6561https://issues.apache.org/jira/browse/CASSANDRA-6561the use case is pretty clear. On Thu, Feb 20, 2014 at 9:26 PM, Peter Lin wool...@gmail.com wrote: Hi Duyhai, yes, I am talking about mixing static and dynamic columns in a single

[BETA RELEASE] Apache Cassandra 2.1.0-beta1 released

2014-02-20 Thread Sylvain Lebresne
The Cassandra team is pleased to announce the release of the first beta for the future Apache Cassandra 2.1.0. Let me first stress that this is beta software and as such is *not* ready for production use. The goal of this release is to give a preview of what will become Cassandra 2.1 and to get

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Edward Capriolo
Peter, I must meet you and shake your hand. I was actually having a debate with a number of people about a week back claiming there was no reason to mix static and dynamic. We do it all the time I am glad someone else besides me gets it and I am not totally mad. Ed On Thu, Feb 20, 2014 at 3:26

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Peter Lin
Hi Ed, you're definitely not mad. I've seen this all over the place. We have several large retail customers and they all suffer the EAV horror. Having built EAV horrors in the past and guilty of inflicting that pain on people, mixing static and dynamic is Totally Freaking awesome! I know many

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Rüdiger Klaehn
Hi Sylvain, I applied the patch to the cassandra-2.0 branch (this required some manual work since I could not figure out which commit it was supposed to apply for, and it did not apply to the head of cassandra-2.0). The benchmark now runs in pretty much identical time to the thrift based

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread DuyHai Doan
Rüdiger SortedMapbyte[], SortedMapbyte[], PairLong, byte[] When using a RandomPartitioner or Murmur3Partitioner, the outer map is a simple Map, not SortedMap. The only case you have a SortedMap for row key is when using OrderPreservingPartitioner, which is clearly not advised for most cases

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Erick Ramirez
Wow! What a fantastic robust discussion. I've just been educated. Peter --- Thanks for providing those use cases. They are great examples. Rudiger --- From what you've done so far, I wouldn't have said your are new to Cassandra. Well done. Cheers, Erick

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Peter Lin
thanks Erick. hopefully sylvain will forgive me for misquoting him. My goal was to share knowledge and get people thinking about how best to use both thrift and cql. Whenever I hear people say cql is the future I get annoyed. My bias feeling is they compliment each other very well and users

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Edward Capriolo
CASSANDRA-6561 is interesting. Though having statically defined columns are not exactly a solution to do everything in thrift. http://planetcassandra.org/blog/post/poking-around-with-an-idea-ranged-metadata/ Before collections or CQL existed I did some of these concepts myself. Say you have a

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Peter Lin
good example Ed. I'm so happy to see other people doing things like this. Even if the official DataStax docs recommend don't mix static and dynamic, to me that's a huge disservice to Cassandra users. If someone really wants to stick to relational model, then NewSql is a better fit, plus gives

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Edward Capriolo
Just read this. i did not mean to offend or start a debate. Generally when people ask me for help I give them the simplest option I know that works. It pains be to watch new users struggling with incompatible drivers and bugs. On Thursday, February 20, 2014, Sylvain Lebresne sylv...@datastax.com

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Robert Coli
On Thu, Feb 20, 2014 at 9:12 AM, Sylvain Lebresne sylv...@datastax.comwrote: Of course, if everyone was using that reasoning, no-one would ever test new features and report problems/suggest improvement. So thanks to anyone like Rüdiger that actually tries stuff and take the time to report

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Edward Capriolo
Recomendations in cassandra have a shelf life of about 1 to 2 years. If you try to assert a recomendation from year ago you stand a solid chance of someone telling you there is now a better way. Casaandra once loved being a schemaless datastore. Imagine that? On Thursday, February 20, 2014,

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Mohit Anchlia
On Thu, Feb 20, 2014 at 4:37 PM, Edward Capriolo edlinuxg...@gmail.comwrote: Recomendations in cassandra have a shelf life of about 1 to 2 years. If you try to assert a recomendation from year ago you stand a solid chance of someone telling you there is now a better way. Casaandra once loved

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Peter Lin
Yeah Slowly nosql products are adding schema :) At least Cassandra is ahead of the curve Sent from my iPhone On Feb 20, 2014, at 7:37 PM, Edward Capriolo edlinuxg...@gmail.com wrote: Recomendations in cassandra have a shelf life of about 1 to 2 years. If you try to assert a

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Edward Capriolo
On Thursday, February 20, 2014, Robert Coli rc...@eventbrite.com wrote: On Thu, Feb 20, 2014 at 9:12 AM, Sylvain Lebresne sylv...@datastax.com wrote: Of course, if everyone was using that reasoning, no-one would ever test new features and report problems/suggest improvement. So thanks to anyone

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Laing, Michael
Just to add my 2 cents... We are very happy CQL users, running in production. I have had no problems modeling whatever I have needed to, including problems similar to the examples set forth previously, in CQL. Personally I think it is an excellent improvement to Cassandra, and we have no

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Edward Capriolo
Hopefully in 3 years no one will be calling your schema 'legacy' and 'not suggested' like they do with mine. On Thursday, February 20, 2014, Laing, Michael michael.la...@nytimes.com wrote: Just to add my 2 cents... We are very happy CQL users, running in production. I have had no problems

Re: paging state will not work

2014-02-20 Thread Katsutoshi
Thank you for the reply. Added: https://issues.apache.org/jira/browse/CASSANDRA-6748 Katsutoshi 2014-02-21 2:14 GMT+09:00 Sylvain Lebresne sylv...@datastax.com: That does sound like a bug. Would you mind opening a JIRA ( https://issues.apache.org/jira/browse/CASSANDRA) ticket for it? On

Consistency Level One Question

2014-02-20 Thread Drew Kutcharian
Hi Guys, I wanted to get some clarification on what happens when you write and read at consistency level 1. Say I have a keyspace with replication factor of 3 and a table which will contain write-once/read-only wide rows. If I write at consistency level 1 and the write happens on node A and I

Re: Consistency Level One Question

2014-02-20 Thread graham sanderson
Writing at a consistency level of ONE means that your write will be acknowledged as soon as one replica confirms that it has made the write to memtable and the commit log (might not be quite synced to disk, but that’s a separate issue). All the writes are submitted in parallel, so it is very

Re: Consistency Level One Question

2014-02-20 Thread graham sanderson
Note also; that reading at ONE there will be no read repair, since the coordinator does not know that another replica has stale data (remember at ONE, basically only one node is asked for the answer). In practice for our use cases, we always write at LOCAL_QUORUM (failing the whole update if