Re: StreamException while adding nodes

2014-06-13 Thread Philipp Potisk
As we are still failing to add the 3 additional nodes, we still appreciate any further thoughts. I have removed all 3 half-joined nodes, deleted the data-directories and started only one node. Since than (more than 24h hoursa ago) the node is in status JOINING (nodetool status: UJ, nodetool

Re: RPC timeout paging secondary index query results

2014-06-13 Thread Phil Luckhurst
But would you expect performance to drop off so quickly? At 250,000 records we can still page through the query with LIMIT 5 but when adding an additional 50,000 records we can't page past the first 10,000 records even if we drop to LIMIT 10. What about the case where we add 100,000 records

Re: Weird timeout errors

2014-06-13 Thread David Mitchell
On Jun 12, 2014, at 11:39 AM, Robert Coli rc...@eventbrite.com wrote: On Thu, Jun 12, 2014 at 10:29 AM, David Mitchell mitch...@es.net wrote: session.execute(insert into raw_data (key,column1,value) values (%s,%s,%s), ... and then delete them like so: session.execute(delete

RE: Backup Cassandra to

2014-06-13 Thread Camacho, Maria (NSN - FI/Espoo)
Thanks a lot for your responses. Maria. From: ext Jabbar Azam [mailto:aja...@gmail.com] Sent: Thursday, June 12, 2014 10:09 PM To: user@cassandra.apache.org Cc: Jack Krupansky Subject: Re: Backup Cassandra to Yes, I never thought of that. Thanks Jabbar Azam On 12 June 2014 19:45, Jeremy

Re: CQL query regarding indexes

2014-06-13 Thread Akash Pandey
Use senttime as part of primary key CREATE TABLE services.messagepayload_by_date ( record_date timestamp partition_id uuid, messageid bigint, senttime timestamp, PRIMARY KEY (record_date, senttime ) ) Partition id itself should be chronological say a date. Then you put partition id in

Re: Weird timeout errors

2014-06-13 Thread Robert Coli
On Fri, Jun 13, 2014 at 2:58 AM, David Mitchell mitch...@es.net wrote: It does look like the excessive number of tombstones is the culprit. Thanks for pointing me towards that. NP, HTH, HAND, :D The apparent behavior of the nodes to simply not answer and let a timeout come back to the user

Dynamic Columns in Cassandra 2.X

2014-06-13 Thread Mark Greene
I'm looking for some best practices w/r/t supporting arbitrary columns. It seems from the docs I've read around CQL that they are supported in some capacity via collections but you can't exceed 64K in size. For my requirements that would cause problems. So my questions are: 1) Is using Thrift a

Re: Dynamic Columns in Cassandra 2.X

2014-06-13 Thread DuyHai Doan
Hello Mark Dynamic columns, as you said, are perfectly supported by CQL3 via clustering columns. And no, using collections for storing dynamic data is a very bad idea if the cardinality is very high ( 1000 elements) 1) Is using Thrift a valid approach in the era of CQL? -- Less and less.

Re: Dynamic Columns in Cassandra 2.X

2014-06-13 Thread Mark Greene
Thanks DuyHai, I have a follow up question to #2. You mentioned ideally I would create a new table instead of mutating an existing one. This strikes me as bad practice in the world of multi tenant systems. I don't want to create a table per customer. So I'm wondering if dynamically modifying the

Re: Dynamic Columns in Cassandra 2.X

2014-06-13 Thread Robert Coli
On Fri, Jun 13, 2014 at 11:54 AM, DuyHai Doan doanduy...@gmail.com wrote: Dynamic columns, as you said, are perfectly supported by CQL3 via clustering columns. Perfectly supported seems a bit expansive as a claim. They are not quite the same thing as actual dynamic columns and are

Re: Dynamic Columns in Cassandra 2.X

2014-06-13 Thread DuyHai Doan
This strikes me as bad practice in the world of multi tenant systems. I don't want to create a table per customer. So I'm wondering if dynamically modifying the table is an accepted practice? -- Can you give some details about your use case ? How would you alter a table structure to adapt it to a

Re: Dynamic Columns in Cassandra 2.X

2014-06-13 Thread Peter Lin
I like CQL, but it's not a hammer. If thrift is more appropriate for you, then use it. If Cassandra gets to the point where Thrift is removed, I'll just fork Cassandra. That's what's great about open source. On Fri, Jun 13, 2014 at 3:47 PM, DuyHai Doan doanduy...@gmail.com wrote: This strikes

Re: Dynamic Columns in Cassandra 2.X

2014-06-13 Thread Mark Greene
My use case requires the support of arbitrary columns much like a CRM. My users can define 'custom' fields within the application. Ideally I wouldn't have to change the schema at all, which is why I like the old thrift approach rather than the CQL approach. Having said all that, I'd be willing to

Re: Dynamic Columns in Cassandra 2.X

2014-06-13 Thread Peter Lin
Like you, I make extensive use of dynamic columns for similar reasons. In our project, one of the goals is to give end users the ability to design their own schema without having to alter a table. If people really want strong schema, then just use old Sql or NewSql. RDB gives you the full power

Re: Dynamic Columns in Cassandra 2.X

2014-06-13 Thread DuyHai Doan
Hi Mark I believe that in your table you want to have some common fields that will be there whatever customer is, and other fields that are entirely customer-dependent, isn't it ? In this case, creating a table with static columns for the common fields and a clustering column representing all

Re: Dynamic Columns in Cassandra 2.X

2014-06-13 Thread Mark Greene
Yeah I don't anticipate more than 1000 properties, well under in fact. I guess the trade off of using the clustered columns is that I'd have a table that would be tall and skinny which also has its challenges w/r/t memory. I'll look into your suggestion a bit more and consider some others around

Re: Dynamic Columns in Cassandra 2.X

2014-06-13 Thread DuyHai Doan
Well, before talking and discussing about dynamic columns, we should first define it clearly. What do people mean by dynamic columns exactly ? Is it the ability to add many columns of same type to an existing physical row? If yes then CQL3 does support it with clustering columns. On Fri, Jun

Re: Cannot query secondary index

2014-06-13 Thread Jonathan Lacefield
Hello, What you are attempting to do, reminds me of the old sliding window partitioning trick in rdbms systems. You're right, there is no system provided tool that allows you to preform a similar operation. You could always leverage option 3, and then create a service that helps manage the

Re: Dynamic Columns in Cassandra 2.X

2014-06-13 Thread Peter Lin
when I say dynamic column, I mean non-static columns of different types within the same row. Some could be an object or one of the defined datatypes. with thrift I use the appropriate serializer to handle these dynamic columns. On Fri, Jun 13, 2014 at 4:55 PM, DuyHai Doan doanduy...@gmail.com

Re: Cannot query secondary index

2014-06-13 Thread Mohit Anchlia
Some other ways to track old records is: 1) Use external queues - One queue per week or month for instance and pile up data on the queue cluster 2) Create one more table in C* to track the keys per week or month that you can scan to read the keys of the audit table. Make sure you delete the

A problem with truncate and bulk loader

2014-06-13 Thread Huiliang Zhang
Hi, I have a very strange problem with Cassandra bulk loader. Appreciated for explanations. I am using a local cassandra server 2.0.5 with default setting. 1. I created a table A and load 108 rows into it by using a hadoop program with org.apache.cassandra.hadoop.BulkOutputFormat. 2. I run

Re: Dynamic Columns in Cassandra 2.X

2014-06-13 Thread DuyHai Doan
In thrift, when creating a column family, you need to define 1) the row/partition key type 2) the column comparator type 3) the validation type for the actual value (cell in CQL3 terminology) Unless you use dynamic composites feature, which does not exist (and probably won't) in CQL3, I don't

Re: Dynamic Columns in Cassandra 2.X

2014-06-13 Thread Peter Lin
the validation type is set to bytes, and my code is type safe, so it knows which serializers to use. Those dynamic columns are driven off the types in Java. Having said that, CQL3 does have a new custom type feature, but the documentation is basically non-existent on how that actually works. One

Re: Dynamic Columns in Cassandra 2.X

2014-06-13 Thread DuyHai Doan
the validation type is set to bytes, and my code is type safe, so it knows which serializers to use. Those dynamic columns are driven off the types in Java. -- Correct. However, you are still bound by the column comparator type which should be fixed (unless again you set it to bytes, in this case

Re: Dynamic Columns in Cassandra 2.X

2014-06-13 Thread Peter Lin
With text based query approach like CQL, you loose the type with dynamic columns. Yes, we're storing it as bytes, but it is simpler and easier with Thrift to do these types of things. I like CQL3 and what it does, but text based query languages make certain dynamic schema use cases painful.

Re: Dynamic Columns in Cassandra 2.X

2014-06-13 Thread DuyHai Doan
There are always the pros and the cons with a querying language, as always. But as far as I can see, the advantages of Thrift I can see over CQL3 are: 1) Thrift require a little bit less decoding server-side (a difference around 10% in CPU usage). 2) Thrift use more compact storage because

Re: Dynamic Columns in Cassandra 2.X

2014-06-13 Thread graham sanderson
My 2 cents… A motivation for CQL3 AFAIK was to make Cassandra more familiar to SQL users. This is a valid goal, and works well in many cases. Equally there are use cases (that some might find ugly) where Cassandra is chosen explicitly because of the sorts of things you can do at the thrift

Re: Dynamic Columns in Cassandra 2.X

2014-06-13 Thread Laing, Michael
Just to add 2 more cents... :) The CQL3 protocol is asynchronous. This can provide a substantial throughput increase, according to my benchmarking, when one uses non-blocking techniques. It is also peer-to-peer. Hence the server can generate events to send to the client, e.g. schema changes - in

Re: Dynamic Columns in Cassandra 2.X

2014-06-13 Thread Peter Lin
without a doubt there's nice features of CQL3 like notifications and async. I want to see CQL3 mature and handle all the use cases that Thrift handles easily today. It's to everyone's benefit to work together and improve CQL3. Other benefits of Thrift drivers today is being able to use object API

Pattern to store maps of maps...

2014-06-13 Thread Kevin Burton
So the cassandra map support in CQL is nice but it's got me wanting deeper nesting. For example { foo: { bar: hello } } … but that's not possible with CQL. Of course… one solution is something like avro, and then store your entire record as a blob. I guess that's not TOO bad but that means all

Re: Dynamic Columns in Cassandra 2.X

2014-06-13 Thread graham sanderson
Note as I mentioned mid post, thrift also supports async nowadays (there was a recent discussion on cassandra dev and the choice was not to move to it) I think the binary protocol is the way forward; CQL3 needs some new features, or there need to be some other types of requests you can make

Re: Pattern to store maps of maps...

2014-06-13 Thread graham sanderson
My personal opinion is that unless you are doing map operations on a CQL3 map and will always intend to read the whole thing (you don’t have any choice today), don’t use one at all - use a blob of whatever variety makes sense (e.g. Json, AVRO, Protobuf etc) On Jun 13, 2014, at 7:17 PM, Kevin

Re: Dynamic Columns in Cassandra 2.X

2014-06-13 Thread Peter Lin
yes, thrift does have async, though I haven't had to use it yet. Right now I'm working on adding CAS to hector followed by multi slice. On Fri, Jun 13, 2014 at 9:01 PM, graham sanderson gra...@vast.com wrote: Note as I mentioned mid post, thrift also supports async nowadays (there was a

Re: Pattern to store maps of maps...

2014-06-13 Thread Johan Edstrom
We treat this on an Object level in Java as a new table with separate Hydration. On a Map level we currently utilize an Internal CQL3 map where we replace the non scalar values with separate tables - we just stick the ID in. Same for Sets, Arrays and such. You get more writes but you also have

Re: Pattern to store maps of maps...

2014-06-13 Thread Jack Krupansky
The first question is how you need to access this data. Do you need to directly access “bar” from a SELECT? Do you need to access “foo” as... what – Java Map, or what? That said, you can always flatten a map of maps by simply concatenating the keys, such as {“foo_bar”: “hello”} and then you

Re: Pattern to store maps of maps...

2014-06-13 Thread Kevin Burton
On Fri, Jun 13, 2014 at 7:26 PM, Johan Edstrom seij...@gmail.com wrote: We treat this on an Object level in Java as a new table with separate Hydration. On a Map level we currently utilize an Internal CQL3 map where we replace the non scalar values with separate tables - we just stick the ID

Re: Pattern to store maps of maps...

2014-06-13 Thread Kevin Burton
I could see just saying screw it and storing a serialized json object that gets read back in automatically as a map. That wouldn't be too painful but just not super pretty in terms of representing the data in cassandra. On Fri, Jun 13, 2014 at 8:45 PM, Jack Krupansky j...@basetechnology.com

Re: Pattern to store maps of maps...

2014-06-13 Thread Johan Edstrom
Well to throw fire on the debate, that was actually really simple in Thrift. On Jun 13, 2014, at 10:50 PM, Kevin Burton bur...@spinn3r.com wrote: I could see just saying screw it and storing a serialized json object that gets read back in automatically as a map. That wouldn't be too painful