Re: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6).

2014-09-19 Thread Jonathan Haddad
Keep in mind secondary indexes in cassandra are not there to improve performance, or even really be used in a serious user facing manner. Build and maintain your own view of the data, it'll be much faster. On Thu, Sep 18, 2014 at 6:33 PM, Jay Patel wrote: > Hi there, > > We are seeing extreme

RE: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6).

2014-09-19 Thread Parag Patel
Agreed. We only use secondary indexes for column families that are relatively small (~5k rows). For anything larger, we store the data into a wide row (but this depends on your data model) -Original Message- From: jonathan.had...@gmail.com [mailto:jonathan.had...@gmail.com] On Behalf

what's cool about cassandra 2.1.0?

2014-09-19 Thread Tim Dunphy
Hey all, I tried googling around to get an idea about what was new (and potentially cool) in the newest release of cassandra - 2.1.0. But all that I've been able to find so far is this kind of general statement about the new features. https://www.mail-archive.com/user@cassandra.apache.org/msg38

Re: what's cool about cassandra 2.1.0?

2014-09-19 Thread DuyHai Doan
Hello Tim From this blog (http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1) you should find the pointers to other big topics of 2.1 On Fri, Sep 19, 2014 at 3:33 PM, Tim Dunphy wrote: > Hey all, > > I tried googling around to get an idea about what was new (and > potentially cool) i

Re: what's cool about cassandra 2.1.0?

2014-09-19 Thread Tim Dunphy
Thanks I'll check that out! Really appreciate that! On Fri, Sep 19, 2014 at 10:07 AM, DuyHai Doan wrote: > Hello Tim > > From this blog ( > http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1) you should > find the pointers to other big topics of 2.1 > > On Fri, Sep 19, 2014 at 3:33 PM,

Wide Rows - Data Model Design

2014-09-19 Thread Check Peck
I am trying to use wide rows concept in my data modelling design for Cassandra. We are using Cassandra 2.0.6. CREATE TABLE test_data ( test_id int, client_name text, record_data text, creation_date timestamp, last_modified_date timestamp, PRIMARY KEY (test_i

Re: Wide Rows - Data Model Design

2014-09-19 Thread Jonathan Lacefield
Hello, Yes, this is a wide row table design. The first col is your Partition Key. The remaining 2 cols are clustering cols. You will receive ordered result sets based on client_name, record_date when running that query. Jonathan [image: datastax_logo.png] Jonathan Lacefield Solution Archi

Re: Wide Rows - Data Model Design

2014-09-19 Thread DuyHai Doan
"Does my above table falls under the category of wide rows in Cassandra or not?" --> It depends on the cardinality. For each distinct test_id, how many combinations of client_name/record_data do you have ? By the way, why do you put the record_data as part of primary key ? In your table partiton

Re: Wide Rows - Data Model Design

2014-09-19 Thread Check Peck
@DuyHai - I have put that because of this condition - In this table, we can have multiple record_data for same client_name. It can be multiple combinations of client_name and record_data for each distinct test_id. On Fri, Sep 19, 2014 at 8:48 AM, DuyHai Doan wrote: > "Does my above table fall

Re: Wide Rows - Data Model Design

2014-09-19 Thread DuyHai Doan
Ahh yes, sorry, I read too fast, missed it. On Fri, Sep 19, 2014 at 5:54 PM, Check Peck wrote: > @DuyHai - I have put that because of this condition - > > In this table, we can have multiple record_data for same client_name. > > It can be multiple combinations of client_name and record_data for

Re: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6).

2014-09-19 Thread Tyler Hobbs
Jon's advice is definitely still true, but in 2.1 there is https://issues.apache.org/jira/browse/CASSANDRA-1337, which parallelizes the fetching of ranges. On Fri, Sep 19, 2014 at 6:57 AM, Parag Patel wrote: > Agreed. We only use secondary indexes for column families that are > relatively small

Re: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6).

2014-09-19 Thread Jay Patel
Thanks folks for all your inputs! Yes, I totally agree that we need to have a custom column family for indexing. However, we're trying to upgrade our existing cluster from non-vnode to vnode, and queries using secondary indexes breaks badly which used to be good with non-vnode. Btw, there is no da

Re: Blocking while a node finishes joining the cluster after restart.

2014-09-19 Thread Kevin Burton
> Hi Kevin, if you are using the latest version of opscenter, then even the community (= free) edition can do a rolling restart of your cluster. It's pretty convenient. We’re using ansible so I’d like something that integrates with that… On Tue, Sep 16, 2014 at 11:09 AM, Duncan Sands wrote: >

Re: Blocking while a node finishes joining the cluster after restart.

2014-09-19 Thread Kevin Burton
This is great feedback… I think it could actually be even easier than this… You could have an ansible (or whatever cluster management system you’re using) role for just seeds. Then you would serially restart all seeds one at a time. You would need to run ‘nodetool status’ and make sure the node

Upgrade to DSE 4.5

2014-09-19 Thread cass savy
We run on DSE 3.1.3 and only use the Cassandra in prod cluster. What is the release that I need to be on right away. Because if I need to upgrade to DSE 4.5.c* 2.0.7. I need to take 3 paths to get there. I see lot of improvements for solr/Hadoop features in DSE 4.0 and above. Can I upgrade to DS

Re: Blocking while a node finishes joining the cluster after restart.

2014-09-19 Thread Jonathan Haddad
Depending on how you query (one or quorum) you might be able to do 1 rack at a time (or az or whatever you've got) assuming your snitch is set up right > On Sep 19, 2014, at 11:30 AM, Kevin Burton wrote: > > This is great feedback… > > I think it could actually be even easier than this… > >

Re: Blocking while a node finishes joining the cluster after restart.

2014-09-19 Thread Tyler Hobbs
On Fri, Sep 19, 2014 at 1:26 PM, Kevin Burton wrote: > We’re using ansible so I’d like something that integrates with that… I'm not familiar with Ansible, so I don't know if it's useful, but OpsCenter has a REST api you can use to do anything you can do from the UI. For example, a rolling rest

can't launch cassandra 2.1.0

2014-09-19 Thread Tim Dunphy
Hey all, I'm attempting to upgrade from cassandra 2.0.10 to version 2.1.0. However when launching the new version I'm running into the following: [root@beta-new:/etc/alternatives/cassandrahome] #./bin/cassandra -f SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:

Re: can't launch cassandra 2.1.0

2014-09-19 Thread DuyHai Doan
java.lang.NoSuchMethodError --> Seems like there is inconsistency with your jar dependencies On Fri, Sep 19, 2014 at 11:05 PM, Tim Dunphy wrote: > Hey all, > > I'm attempting to upgrade from cassandra 2.0.10 to version 2.1.0. > > However when launching the new version I'm running into the foll

Re: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6).

2014-09-19 Thread Tyler Hobbs
On Fri, Sep 19, 2014 at 12:41 PM, Jay Patel wrote: > > Btw, there is no data in the table. Table is empty. Query is fired on the > empty table. > This is actually the worst case for secondary index lookups. > > From the tracing ouput, I don't understand why it's doing multiple scans > on one n

Is it wise to increase native_transport_max_threads if we have lots of CQL clients?

2014-09-19 Thread Donald Smith
If we have hundreds of CQL clients (for C* 2.0.9), should we increase native_transport_max_threads in cassandra.yaml from the default (128) to the number of clients? If we don't do that, I presume requests will queue up, resulting in higher latency, What's a reasonable max value for increas

Re: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6).

2014-09-19 Thread DuyHai Doan
"It will merge requests to neighboring ranges when the same node is a replica for both of them. Without vnodes, this usually results in all ranges for a node being merged. With vnodes, merging still happens, but not all ranges can be merged." --> But does it implies that with vnodes, there are

Re: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6).

2014-09-19 Thread Tyler Hobbs
On Fri, Sep 19, 2014 at 4:19 PM, DuyHai Doan wrote: > > But does it implies that with vnodes, there are actually "extra work" to > do for scanning indices ? > Yes. > If yes, is this "extra load" rather I/O bound or CPU bound ? > It doesn't necessarily change what the query is "bound" by, exc

Help with approach to remove RDBMS schema from code to move to C*?

2014-09-19 Thread Les Hartzman
My company is using an RDBMS for storing time-series data. This application was developed before Cassandra and NoSQL. I'd like to move to C*, but ... The application supports data coming from multiple models of devices. Because there is enough variability in the data, the main table to hold the de

Re: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6).

2014-09-19 Thread Robert Coli
On Fri, Sep 19, 2014 at 2:19 PM, DuyHai Doan wrote: > But does it implies that with vnodes, there are actually "extra work" to > do for scanning indices ? > Vnodes are just nodes, so they have all the problems-associated-with-many-nodes one would get with 256x as many nodes. =Rob

Re: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6).

2014-09-19 Thread Jay Patel
Thanks Tyler for the details. I'm still trying to understand what you described. Just to simplify my question & what I don't understand: When coordinator fires indexed scan request to node 192.168.51.22, why don't it ask that node to check all of its (at least primary) ranges for the queried data

Upgrade steps to address CASSANDRA-4411

2014-09-19 Thread Randy Fradin
I have a question about the steps listed in this article for addressing CASSANDRA-4411 in an upgrade from a version <= 1.1.3 or to a version >= 1.1.5 when using leveled compaction: http://www.datastax.com/docs/1.1/install/upgrading#upgrade-step

Re: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6).

2014-09-19 Thread Jay Patel
Thanks Robert for your intput but that sounds little crazy to me. Still physical node is the same so why can't it just do one indexed scan for all the contiguous or non-contiguous token ranges (vnodes) held by that physical node. I doubt that it needs to respect token order for "some reason" & henc

Re: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6).

2014-09-19 Thread Tyler Hobbs
On Fri, Sep 19, 2014 at 4:53 PM, Jay Patel wrote: > > When coordinator fires indexed scan request to node 192.168.51.22, why > don't it ask that node to check all of its (at least primary) ranges for > the queried data, at once. Also, internally that node should be able to > just do one scan thro

Re: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6).

2014-09-19 Thread Jay Patel
Thanks Tyler for clarification. I'll opened a tix CASSANDRA-7982 . For now, I've assigned to myself and put you as a reviewer. Pls. change assignment as you prefer.. Assume that we now batch the requests & send only one request to the replica:

Re: Help with approach to remove RDBMS schema from code to move to C*?

2014-09-19 Thread James Briggs
Most of the C* success stories are for greenfield applications. Migrating from one database to another database is a lot of work. C* offers no magical path. If you only have a few tables and minor RDBMS feature dependencies, it can be done. Make sure your users and QA people are cooperative fi

Re: Blocking while a node finishes joining the cluster after restart.

2014-09-19 Thread James Briggs
Kevin: "The serial approach would take a LONG time for large clusters. If you have sixty nodes, it could take an hour to do a rolling restart." 1) In Cassandra land, an hour is nothing. There's people doing repairs that practically never finish - as soon as one finishes after a week, they hav

Re: what's cool about cassandra 2.1.0?

2014-09-19 Thread James Briggs
I'll be blunt. The reason to use the latest 2.0 or soon 2.1 is because Apple has committed 20 patches that make Cassandra operationally useful. Apple is the QA lab for Cassandra. Their conference talk was very exciting. I hope a video of that gets posted in October. Thanks, James Briggs. -- Ca

Re: Help with approach to remove RDBMS schema from code to move to C*?

2014-09-19 Thread Jack Krupansky
Start by asking how you intend to query the data. That should drive the data model. Is there existing app client code or an app layer that is already using the current schema, or are you intending to rewrite that as well. FWIW, you could place the numeric columns in a numeric map collection, an