Re: Performance problem with large wide row inserts using CQL

2014-02-24 Thread Rüdiger Klaehn
On Mon, Feb 24, 2014 at 11:47 AM, Sylvain Lebresne wrote: > >> >>> I still have some questions regarding the mapping. Please bear with me if these are stupid questions. I am quite new to Cassandra. The basic cassandra data model for a keyspace is something like this, right

Re: Performance problem with large wide row inserts using CQL

2014-02-24 Thread Sylvain Lebresne
> > > >> >>> I still have some questions regarding the mapping. Please bear with me >>> if these are stupid questions. I am quite new to Cassandra. >>> >>> The basic cassandra data model for a keyspace is something like this, >>> right? >>> >>> SortedMap> >>> ^ row key. determines

Re: Performance problem with large wide row inserts using CQL

2014-02-24 Thread Sylvain Lebresne
On Fri, Feb 21, 2014 at 8:53 PM, Yogi Nerella wrote: > I am using CCM to install the servers, it is bringing in the source code, > is there any option for CCM which I can set only to download the binary, > just to make sure it is not bringing in the working copy of the code. > No there isn't. An

Re: Performance problem with large wide row inserts using CQL

2014-02-22 Thread Rüdiger Klaehn
On Fri, Feb 21, 2014 at 11:51 AM, Sylvain Lebresne wrote: > On Thu, Feb 20, 2014 at 10:49 PM, Rüdiger Klaehn wrote: > >> Hi Sylvain, >> >> I applied the patch to the cassandra-2.0 branch (this required some >> manual work since I could not figure out which commit it was supposed to >> apply for, a

Re: Performance problem with large wide row inserts using CQL

2014-02-21 Thread Yogi Nerella
I am using CCM to install the servers, it is bringing in the source code, is there any option for CCM which I can set only to download the binary, just to make sure it is not bringing in the working copy of the code. I am using the following statements to create Keyspace and table definition. cr

Re: Performance problem with large wide row inserts using CQL

2014-02-21 Thread Yogi Nerella
Sylvain, I am trying ccm to install and it does from source directory, I have tried 2.0.4/3/2/1 and 1.2.15, all of them are reporting the same failure after 127 records inserted. I am using 1.56.34 and 1.56.38 client both reports the same issue. Is something wrong with the client or the server,

Re: Performance problem with large wide row inserts using CQL

2014-02-21 Thread Peter Lin
I'm with you Ed. I love that Cassandra is moving forward at a very fast pace, but often I get the impression some people want to avoid thrift at all cost whether it makes sense or not. I totally understand the commiters don't have enough bandwidth to make sure all existing cases transition smoothl

Re: Performance problem with large wide row inserts using CQL

2014-02-21 Thread Edward Capriolo
The main issue is that cassandra has two of everything. Two access apis, two meta data systems, and two groups of users. Those groups of users using the original systems thrift, cfmetadata, and following the advice of three years ago have been labled obsolete (did you ever see that twighlight zone

Re: Performance problem with large wide row inserts using CQL

2014-02-21 Thread Sylvain Lebresne
On Thu, Feb 20, 2014 at 10:49 PM, Rüdiger Klaehn wrote: > Hi Sylvain, > > I applied the patch to the cassandra-2.0 branch (this required some manual > work since I could not figure out which commit it was supposed to apply > for, and it did not apply to the head of cassandra-2.0). > Yeah, some c

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Edward Capriolo
Hopefully in 3 years no one will be calling your schema 'legacy' and 'not suggested' like they do with mine. On Thursday, February 20, 2014, Laing, Michael wrote: > Just to add my 2 cents... > We are very happy CQL users, running in production. > I have had no problems modeling whatever I have ne

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Laing, Michael
Just to add my 2 cents... We are very happy CQL users, running in production. I have had no problems modeling whatever I have needed to, including problems similar to the examples set forth previously, in CQL. Personally I think it is an excellent improvement to Cassandra, and we have no intenti

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Edward Capriolo
On Thursday, February 20, 2014, Robert Coli wrote: > On Thu, Feb 20, 2014 at 9:12 AM, Sylvain Lebresne wrote: >> >> Of course, if everyone was using that reasoning, no-one would ever test new features and report problems/suggest improvement. So thanks to anyone like Rüdiger that actually tries st

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Peter Lin
Yeah Slowly nosql products are adding schema :) At least Cassandra is ahead of the curve Sent from my iPhone > On Feb 20, 2014, at 7:37 PM, Edward Capriolo wrote: > > Recomendations in cassandra have a shelf life of about 1 to 2 years. If you > try to assert a recomendation from year ago y

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Mohit Anchlia
On Thu, Feb 20, 2014 at 4:37 PM, Edward Capriolo wrote: > Recomendations in cassandra have a shelf life of about 1 to 2 years. If > you try to assert a recomendation from year ago you stand a solid chance of > someone telling you there is now a better way. > > Casaandra once loved being a schemale

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Edward Capriolo
Recomendations in cassandra have a shelf life of about 1 to 2 years. If you try to assert a recomendation from year ago you stand a solid chance of someone telling you there is now a better way. Casaandra once loved being a schemaless datastore. Imagine that? On Thursday, February 20, 2014, Pete

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Robert Coli
On Thu, Feb 20, 2014 at 9:12 AM, Sylvain Lebresne wrote: > Of course, if everyone was using that reasoning, no-one would ever test > new features and report problems/suggest improvement. So thanks to anyone > like Rüdiger that actually tries stuff and take the time to report problems > when they t

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Edward Capriolo
Just read this. i did not mean to offend or start a debate. Generally when people ask me for help I give them the simplest option I know that works. It pains be to watch new users struggling with incompatible drivers and bugs. On Thursday, February 20, 2014, Sylvain Lebresne wrote: > On Thu, Feb

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Peter Lin
good example Ed. I'm so happy to see other people doing things like this. Even if the official DataStax docs recommend don't mix static and dynamic, to me that's a huge disservice to Cassandra users. If someone really wants to stick to relational model, then NewSql is a better fit, plus gives use

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Edward Capriolo
CASSANDRA-6561 is interesting. Though having statically defined columns are not exactly a solution to do everything in "thrift". http://planetcassandra.org/blog/post/poking-around-with-an-idea-ranged-metadata/ Before collections or CQL existed I did some of these concepts myself. Say you have a

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Peter Lin
thanks Erick. hopefully sylvain will forgive me for misquoting him. My goal was to share knowledge and get people thinking about how best to use both thrift and cql. Whenever I hear people say "cql is the future" I get annoyed. My bias feeling is they compliment each other very well and users shou

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Erick Ramirez
Wow! What a fantastic robust discussion. I've just been educated. Peter --- Thanks for providing those use cases. They are great examples. Rudiger --- From what you've done so far, I wouldn't have said your are new to Cassandra. Well done. Cheers, Erick

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread DuyHai Doan
Rüdiger "SortedMap>" When using a RandomPartitioner or Murmur3Partitioner, the outer map is a simple Map, not SortedMap. The only case you have a SortedMap for row key is when using OrderPreservingPartitioner, which is clearly not advised for most cases because of hot spots in the cluster.

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Rüdiger Klaehn
Hi Sylvain, I applied the patch to the cassandra-2.0 branch (this required some manual work since I could not figure out which commit it was supposed to apply for, and it did not apply to the head of cassandra-2.0). The benchmark now runs in pretty much identical time to the thrift based benchmar

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Peter Lin
Hi Ed, you're definitely not mad. I've seen this all over the place. We have several large retail customers and they all suffer the EAV horror. Having built EAV horrors in the past and guilty of inflicting that pain on people, mixing static and dynamic is "Totally Freaking awesome!" I know many

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Edward Capriolo
Peter, I must meet you and shake your hand. I was actually having a debate with a number of people about a week back claiming there was "no reason to mix static and dynamic". We do it all the time I am glad someone else besides me "gets it" and I am not totally mad. Ed On Thu, Feb 20, 2014 at 3

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread DuyHai Doan
Ok I see what you mean Peter. After reading CASSANDRA-6561the use case is pretty clear. On Thu, Feb 20, 2014 at 9:26 PM, Peter Lin wrote: > > Hi Duyhai, > > yes, I am talking about mixing static and dynamic columns in a single > column fam

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Peter Lin
Hi Duyhai, yes, I am talking about mixing static and dynamic columns in a single column family. Let me give you an example from retail. Say you're amazon and you sell over 10K different products. How do you store all those products with all the different properties like color, size, dimensions, e

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread DuyHai Doan
"Developers can use what ever type they want for the name or value in a dynamic column and the framework will handle it appropriately." What do you mean by "dynamic" column ? If you want to be able to insert an arbitrary number of columns in one physical row, CQL3 clustering is there and does pre

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Peter Lin
my apologies Sylvain, I didn't mean to misquote you. I still feel that even if someone is only going to use CQL, it is "worth it" to learn thrift. In the interest of discussion, I looked at both jira tickets and I don't see how that makes it so a developer can specify the name and value type for a

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Edward Capriolo
The only thing you really can not do CQL3 loses some of the concept of CQL2 metadata, namedly the default validation and then column specific validation. In cassandra-cql we can say (butchering the syntax) create column family x DEFAULT_VALIDATOR = UTF8Type columns named y are int columns named z

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Benedict Elliott Smith
> > Cassandra will throw an exception indicating the type is different than > the default type. If you want untyped data, store blobs. Or store in a different column (they're free when empty, after all). Type safety is considered a good thing by many. On 20 February 2014 17:26, Peter Lin wrote

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Sylvain Lebresne
On Thu, Feb 20, 2014 at 6:26 PM, Peter Lin wrote: > > I disagree with the sentiment that "thrift is not worth the trouble". > Way to quote only part of my sentence and get mental on it. My full sentence was "it's probably not worth the trouble to start with thrift if you're gonna use CQL later".

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Mohit Anchlia
+1 I like hector client that uses thrift interface and exposes APIs that is similar to how Cassandra physically stores the values. On Thu, Feb 20, 2014 at 9:26 AM, Peter Lin wrote: > > I disagree with the sentiment that "thrift is not worth the trouble". > > CQL and all SQL inspired dialects li

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Peter Lin
I disagree with the sentiment that "thrift is not worth the trouble". CQL and all SQL inspired dialects limit one's ability to use arbitrary typed data in dynamic columns. With thrift it's easy and straight forward. With CQL there is no way to tell Cassandra the type of the name and value for a dy

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Sylvain Lebresne
On Thu, Feb 20, 2014 at 2:16 PM, Edward Capriolo wrote: > For what it is worth you schema is simple and uses compact storage. Thus > you really dont need anything in cassandra 2.0 as far as i can tell. You > might be happier with a stable release like 1.2.something and just hector > or astyanax. Y

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Edward Capriolo
Dont worry there will be plenty of time to upgrade to 2.0 or 2.1 later. It is an easy upgrade path an you will likely do it 2-4 tmes a year. Dont chose the latest and gteatest now thnking that you are future proofing. In reality you are volunteering as a beta tester. On Thursday, February 20, 2014

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Edward Capriolo
For what it is worth you schema is simple and uses compact storage. Thus you really dont need anything in cassandra 2.0 as far as i can tell. You might be happier with a stable release like 1.2.something and just hector or astyanax. You are really dealing with many issues you should not have to jus

Re: Performance problem with large wide row inserts using CQL

2014-02-19 Thread Sylvain Lebresne
On Wed, Feb 19, 2014 at 9:38 PM, Rüdiger Klaehn wrote: > > I have cloned the cassandra repo, applied the patch, and built it. But > when I want to run the bechmark I get an exception. See below. I tried with > a non-managed dependency to > cassandra-driver-core-2.0.0-rc3-SNAPSHOT-jar-with-depende

Re: Performance problem with large wide row inserts using CQL

2014-02-19 Thread Yogi Nerella
Rüdiger, I have tried CQL only, and it was failing after 127 records added. I have to check what is wrong. I have the keyspace and table definiton exactly as you. I am new to scala, I do not know how to do this. I may try this in the evening. Yogi On Wed, Feb 19, 2014 at 2:50 PM, Rüdiger Kl

Re: Performance problem with large wide row inserts using CQL

2014-02-19 Thread Rüdiger Klaehn
This must be something to do with server side validation. If I define the table like this it does not happen: cqlsh> CREATE KEYSPACE IF NOT EXISTS test1 WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 }; cqlsh> use test1; cqlsh:test1> create TABLE employees2 (time blob,

Re: Performance problem with large wide row inserts using CQL

2014-02-19 Thread Yogi Nerella
I have a two node cluster. Tried with both 2.0.4 and 2.0.5. I have tried your code, and exactly after inserting 127 rows, the next insert fails. 10.566482102276002 123 2.7760618708015863 124 8.936212688296054 125 9.532923906962095 126 7.5081516753554505 127 java.lang.RuntimeException: failed to w

Re: Performance problem with large wide row inserts using CQL

2014-02-19 Thread Rüdiger Klaehn
On Wed, Feb 19, 2014 at 7:49 PM, Sylvain Lebresne wrote: > On Wed, Feb 19, 2014 at 11:27 AM, Rüdiger Klaehn wrote: > >> >> Am I doing something wrong, or is this a fundamental limitation of CQL. >> > > Neither. I believe you are running into > https://issues.apache.org/jira/browse/CASSANDRA-6737,

Re: Performance problem with large wide row inserts using CQL

2014-02-19 Thread Rüdiger Klaehn
Hi Yogi, Both benchmarks go to different tables. I originally wanted to just write a lot of data into an empty table and then evaluate what compression ratio I can expect when I ran into the performance problem. I am sorry, I forgot to mention this: I did not figure out how to create a table usin

Re: Performance problem with large wide row inserts using CQL

2014-02-19 Thread Sylvain Lebresne
On Wed, Feb 19, 2014 at 11:27 AM, Rüdiger Klaehn wrote: > > Am I doing something wrong, or is this a fundamental limitation of CQL. > Neither. I believe you are running into https://issues.apache.org/jira/browse/CASSANDRA-6737, which is a bug, a performance bug, which we should and will fix. So

Re: Performance problem with large wide row inserts using CQL

2014-02-19 Thread Yogi Nerella
Rudger, I am trying this on 2.0.5 to see, but both Scala code and AST code are going to different tables? Can you give the exact AST code you are trying? Yogi On Wed, Feb 19, 2014 at 10:49 AM, Sylvain Lebresne wrote: > On Wed, Feb 19, 2014 at 11:27 AM, Rüdiger Klaehn wrote: > >> >> Am I doin

Re: Performance problem with large wide row inserts using CQL

2014-02-19 Thread DuyHai Doan
Agree with John Preparing a statement follows this process: 1) send the statement to the server 2) statement validation on server side 3) if validation is ok, the C* node will assign an UUID to this prepared statement 4) send back the UUID to the java driver core Now, you can re-use this s

Re: Performance problem with large wide row inserts using CQL

2014-02-19 Thread Nate McCall
Hi Rüdiger, I just saw this after I answered on the SO thread: http://stackoverflow.com/questions/21778671/cassandra-how-to-insert-a-new-wide-row-with-good-performance-using-cql/21884943#21884943 On Wed, Feb 19, 2014 at 8:57 AM, John Sanda wrote: > From a quick glance at your code, it looks lik

Re: Performance problem with large wide row inserts using CQL

2014-02-19 Thread John Sanda
>From a quick glance at your code, it looks like you are preparing your insert statement multiple times. You only need to prepare it once. I would expect to see some improvement with that change. On Wed, Feb 19, 2014 at 5:27 AM, Rüdiger Klaehn wrote: > Hi all, > > I am evaluating Cassandra for

Performance problem with large wide row inserts using CQL

2014-02-19 Thread Rüdiger Klaehn
Hi all, I am evaluating Cassandra for satellite telemetry storage and analysis. I set up a little three node cluster on my local development machine and wrote a few simple test programs. My use case requires storing incoming telemetry updates in the database at the same rate as they are coming in