Most stable version?

2016-04-11 Thread Jean Tremblay
Hi, Which version of Cassandra should considered most stable in the version 3? I see two main branch: the branch with the version 3.0.* and the tick-tock one 3.*.*. So basically my question is: which one is most stable, version 3.0.5 or version 3.3? I know odd versions in tick-took are bug fix.

Re: Large primary keys

2016-04-11 Thread Jack Krupansky
Check out the text indexing feature of the new SASI feature in Cassandra 3.4. You could write a custom tokenizer to extract entities and then be able to query for documents that contain those entities. That said, using a SHA digest key for the primary key has merit for direct access to the

Re: Large primary keys

2016-04-11 Thread James Carman
S3 maybe? On Mon, Apr 11, 2016 at 7:05 PM Robert Wille wrote: > I do realize its kind of a weird use case, but it is legitimate. I have a > collection of documents that I need to index, and I want to perform entity > extraction on them and give the extracted entities special

Re: Large primary keys

2016-04-11 Thread Robert Wille
I do realize its kind of a weird use case, but it is legitimate. I have a collection of documents that I need to index, and I want to perform entity extraction on them and give the extracted entities special treatment in my full-text index. Because entity extraction costs money, and each

Re: Unable to connect to CQLSH or Launch SparkContext

2016-04-11 Thread Bryan Cheng
Check your environment variables, looks like JAVA_HOME is not properly set On Mon, Apr 11, 2016 at 9:07 AM, Lokesh Ceeba - Vendor < lokesh.ce...@walmart.com> wrote: > Hi Team, > > Help required > > > > cassandra:/app/cassandra $ nodetool status > > > > Cassandra 2.0 and later

Re: Large primary keys

2016-04-11 Thread Jan Kesten
Hi Robert, why do you need the actual text as a key? I sounds a bit unatural at least for me. Keep in mind that you cannot do "like" queries on keys in cassandra. For performance and keeping things more readable I would prefer hashing your text and use the hash as key. You should also take

Re: Migrating to CQL and Non Compact Storage

2016-04-11 Thread Jim Ancona
On Mon, Apr 11, 2016 at 4:19 PM, Jack Krupansky wrote: > Some of this may depend on exactly how you are using so-called COMPACT > STORAGE. I mean, if your tables really are modeled as all but exactly one > column in the primary key, then okay, COMPACT STORAGE may be a

Re: Large primary keys

2016-04-11 Thread James Carman
Why does the text need to be the key? On Mon, Apr 11, 2016 at 6:04 PM Robert Wille wrote: > I have a need to be able to use the text of a document as the primary key > in a table. These texts are usually less than 1K, but can sometimes be 10’s > of K’s in size. Would it be

Re: Large primary keys

2016-04-11 Thread Bryan Cheng
While large primary keys (within reason) should work, IMO anytime you're doing equality testing you are really better off minimizing the size of the key. Huge primary keys will also have very negative impacts on your key cache. I would err on the side of the digest, but I've never had a need for

Large primary keys

2016-04-11 Thread Robert Wille
I have a need to be able to use the text of a document as the primary key in a table. These texts are usually less than 1K, but can sometimes be 10’s of K’s in size. Would it be better to use a digest of the text as the key? I have a background process that will occasionally need to do a full

Restricting secondary indexes

2016-04-11 Thread Thanigai Vellore
Hello, In a multi-DC setup (where one DC serves real-time traffic and the other DC serves up analytical loads), is it possible to setup and restrict secondary indexes only to the analytics DC? The intent is to not create the overhead of the secondary index on the DC where real-time traffic is

Re: DataStax OpsCenter with Apache Cassandra

2016-04-11 Thread James Carman
Since when did this become a DataStax support email list? If folks have questions about DataStax products, shouldn't they be contacting the company directly? On Sun, Apr 10, 2016 at 1:13 PM Jeff Jirsa wrote: > It is possible to use OpsCenter for open source /

Re: Migrating to CQL and Non Compact Storage

2016-04-11 Thread Anuj Wadehra
Thanks Jim. I think you understand the pain of migrating TBs of data to new tables. There is no command to change from compact to non compact storage and the fastest solution to migrate data using Spark is too slow for production systems. And the pain gets bigger when your performance dips

Re: Migrating to CQL and Non Compact Storage

2016-04-11 Thread Jim Ancona
Jack, the Datastax link he posted ( http://www.datastax.com/dev/blog/thrift-to-cql3) says that for column families with mixed dynamic and static columns: "The only solution to be able to access the column family fully is to remove the declared columns from the thrift schema altogether..." I think

unsubscribe

2016-04-11 Thread Gvb Subrahmanyam
Disclaimer: This message and the information contained herein is proprietary and confidential and subject to the Tech Mahindra policy statement, you may review the

Re: 1, 2, 3...

2016-04-11 Thread Emīls Šolmanis
You're not mistaken, just thought you were after partition keys and didn't read the question that carefully. Afaik, you're SOOL if you need to distinguish clustering keys as unique. Well, other than doing a full table scan of course, which I'm assuming is not too plausible. On Mon, 11 Apr 2016 at

Re: 1, 2, 3...

2016-04-11 Thread Jack Krupansky
Unless I'm mistaken, nodetool tablestats gives you the number of partitions (partition keys), not the number of primary keys. IOW, the term "keys" is ambiguous. That's why I phrased the original question as count of (CQL) rows, to distinguish from the pre-CQL3 concept of a partition being treated

Re: 1, 2, 3...

2016-04-11 Thread Emīls Šolmanis
Wouldn't the "number of keys" part of *nodetool cfstats* run on every node, summed and divided by replication factor give you a decent approximation? Or are you really after a completely precise number? On Mon, 11 Apr 2016 at 16:18 Jack Krupansky wrote: > Agreed, that

unsubscribe

2016-04-11 Thread Scott Thompson
Scott Thompson This message and any attached documents are only for the use of the intended recipient(s), are confidential and may contain privileged information. Any unauthorized review, use, retransmission, or other disclosure is strictly prohibited.

Re: Migrating to CQL and Non Compact Storage

2016-04-11 Thread Jack Krupansky
Sorry, but your message is too confusing - you say "reading dynamic columns in CQL" and "make the table schema less", but neither has any relevance to CQL! 1. CQL tables always have schemas. 2. All columns in CQL are statically declared (even maps/collections are statically declared columns.)

Re: 1, 2, 3...

2016-04-11 Thread Jack Krupansky
Agreed, that anything requiring a full table scan, short of batch analytics,is an antipattern, although the goal is not to do a full scan per se, but just get the row count. It still surprises people that Cassandra cannot quickly get COUNT(*). The easy answer: Use DSE Search and do a Solr query

RE: 1, 2, 3...

2016-04-11 Thread SEAN_R_DURITY
Cassandra is not good for table scan type queries (which count(*) typically is). While there are some attempts to do that (as noted below), this is a path I avoid. Sean Durity From: Max C [mailto:mc_cassan...@core43.com] Sent: Saturday, April 09, 2016 6:19 PM To: user@cassandra.apache.org

Re: Migrating to CQL and Non Compact Storage

2016-04-11 Thread Anuj Wadehra
Any comments or suggestions on this one?  ThanksAnuj Sent from Yahoo Mail on Android On Sun, 10 Apr, 2016 at 11:39 PM, Anuj Wadehra wrote: Hi We are on 2.0.14 and Thrift. We are planning to migrate to CQL soon but facing some challenges. We have a cf with a mix of

[RELEASE] Apache Cassandra 3.0.5 released

2016-04-11 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra version 3.0.5. Apache Cassandra is a fully distributed database. It is the right choice when you need scalability and high availability without compromising performance. http://cassandra.apache.org/ Downloads of source

Re: Latency overhead on Cassandra cluster deployed on multiple AZs (AWS)

2016-04-11 Thread Chris Lohfink
Where do you get the ~1ms latency between AZs? Comparing a short term average to a 99th percentile isn't very fair. "Over the last month, the median is 2.09 ms, 90th percentile is 20ms, 99th percentile is 47ms." - per

unsubscribe

2016-04-11 Thread Vitaly Sourikov
unsubscribe

RE: all the nost are not reacheable when running massive deletes

2016-04-11 Thread Paco Trujillo
Thanks Alain for all your answer: - In a few days I am going to set up a maintenance window so I can test again to run repairs and see what happens. Definitely I will run 'iostat -mx 5 100' On that time and also use the command you pointed to see why is consuming so much power. -

Re: Data modelling, including cleanup

2016-04-11 Thread Bo Finnerup Madsen
Hi Hannu, Thank you for the pointer. We ended up using materialized views in cassandra 3.0.3. Seems to do the trick :) tor. 17. mar. 2016 kl. 11.16 skrev Hannu Kröger : > Hi, > > That’s how I have done it in many occasions. Nowadays there is the > possibility use Cassandra