Setting bloom_filter_fp_chance < 0.01

2016-05-17 Thread Adarsh Kumar
Hi, What is the impact of setting bloom_filter_fp_chance < 0.01. During performance tuning I was trying to tune bloom_filter_fp_chance and have following questions: 1). Why bloom_filter_fp_chance = 0 is not allowed. ( https://issues.apache.org/jira/browse/CASSANDRA-5013) 2). What is the maximum/

Re: Accessing Cassandra data from Spark Shell

2016-05-17 Thread Ben Slater
It definitely should be possible for 1.5.2 (I have used it with spark-shell and cassandra connector with 1.4.x). The main trick is in lining up all the versions and building an appropriate connector jar. Cheers Ben On Wed, 18 May 2016 at 15:40 Cassa L wrote: > Hi, > I followed instructions to r

About the data structure of partition index

2016-05-17 Thread Hiroyuki Yamada
Hi, I am wondering how many primary keys are stored in one partition index. As the following documents say, I understand that each partition

Re: Accessing Cassandra data from Spark Shell

2016-05-17 Thread Cassa L
Hi, I followed instructions to run SparkShell with Spark-1.6. It works fine. However, I need to use spark-1.5.2 version. With it, it does not work. I keep getting NoSuchMethod Errors. Is there any issue running Spark Shell for Cassandra using older version of Spark? Regards, LCassa On Tue, May 1

Re: Bloom filter memory usage disparity

2016-05-17 Thread Jeff Jirsa
Even with the same data, bloom filter is based on sstables. If your compaction behaves differently on 2 nodes than the third, your bloom filter RAM usage may be different. From: Kai Wang Reply-To: "user@cassandra.apache.org" Date: Tuesday, May 17, 2016 at 8:02 PM To: "user@cassandra.apache.

Re: Bloom filter memory usage disparity

2016-05-17 Thread Kai Wang
Alain, Thanks for replying. I am using C* 2.2.4. Yes the table is RF=3. I changed bloom_filter_fp_chance from 0.01 to 0.1 a couple of months ago. On Tue, May 17, 2016 at 11:05 AM, Alain RODRIGUEZ wrote: > Hi, we would need more information here (if you did not solve it yet). > > What is your

Re: restore cassandra snapshots on a smaller cluster

2016-05-17 Thread Ben Slater
It should definitely work if you use sstableloader to load all the files. I imagine it is possible doing a straight restore (copy sstables) if you assign the tokens from multiple source nodes to one target node using the initial_token parameter in cassandra.yaml. Cheers Ben On Wed, 18 May 2016 a

Re: restore cassandra snapshots on a smaller cluster

2016-05-17 Thread Jeff Jirsa
http://www.datastax.com/dev/blog/using-the-cassandra-bulk-loader-updated From: Luigi Tagliamonte Reply-To: "user@cassandra.apache.org" Date: Tuesday, May 17, 2016 at 5:35 PM To: "user@cassandra.apache.org" Subject: restore cassandra snapshots on a smaller cluster Hi everyone, i'm wondering

restore cassandra snapshots on a smaller cluster

2016-05-17 Thread Luigi Tagliamonte
Hi everyone, i'm wondering if it is possible to restore all the snapshots of a cluster (10 nodes) in a smaller cluster (3 nodes)? If yes how to do it? -- Luigi --- “The only way to get smarter is by playing a smarter opponent.”

Re: Applying TTL Change quickly

2016-05-17 Thread Jeff Jirsa
Fastest way? Stop cassandra, use sstablemetadata to remove any files with maxTimestamp > 2 days. Start cassandra. Works better with some compaction strategies than others (probably find a few droppable sstables with either DTCS / STCS, but not perfect). Cleanest way? One by one (starting with

Applying TTL Change quickly

2016-05-17 Thread Anubhav Kale
Hello, We use STCS and DTCS on our tables and recently made a TTL change (reduced from 8 days to 2) on a table with large amounts of data. What is the best way to quickly purge old data ? I am playing with tombstone_compaction_interval at the moment, but would like some suggestions on what else

Re: Cassandra Debian repos (Apache vs DataStax)

2016-05-17 Thread Drew Kutcharian
OK to make things even more confusing, the “Release” files in the Apache Repo say "Origin: Unofficial Cassandra Packages”!! i.e. http://dl.bintray.com/apache/cassandra/dists/35x/:Release > On May 17, 2016, at 12:11 PM, Drew Kutcharian wrote: > > BTW, the language on this page should probably

Re: Cassandra Debian repos (Apache vs DataStax)

2016-05-17 Thread Drew Kutcharian
BTW, the language on this page should probably change since it currently sounds like the official repo is the DataStax one and Apache is only an “alternative" http://wiki.apache.org/cassandra/DebianPackaging - Drew > On May 17, 2016, at 11:35 AM, Drew Kutcharian wrote: > > Thanks Eric. > >

Re: Cassandra Debian repos (Apache vs DataStax)

2016-05-17 Thread Drew Kutcharian
Thanks Eric. > On May 17, 2016, at 7:50 AM, Eric Evans wrote: > > On Mon, May 16, 2016 at 5:19 PM, Drew Kutcharian wrote: >> >> What’s the difference between the two “Community” repositories Apache >> (http://www.apache.org/dist/cassandra/debian) and DataStax >> (http://debian.datastax.com/

Re: [C*3.0.3]lucene indexes not deleted and nodetool repair makes DC unavailable

2016-05-17 Thread Andres de la Peña
Hi Siddarth, Lucene doesn't immediately remove deleted documents from disk. Instead, it just marks them as deleted, and they are effectively removed during segments merge. This is quite similar to how C* manages deletions with tombstones and compactions. Regards, 2016-05-17 17:30 GMT+01:00 Siddh

Nodetool clearsnapshot doesn't support Column Families

2016-05-17 Thread Anubhav Kale
Hello, I noticed that clearsnapshot doesn't support removing snapshots on a per CF, like how snapshots lets you take it per CF. http://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsClearSnapShot.html I couldn't find a JIRA to address this. Is this intentional ? If so, I am curious to

Re: [C*3.0.3]lucene indexes not deleted and nodetool repair makes DC unavailable

2016-05-17 Thread Siddharth Verma
Hi Eduardo, Thanks for your reply. If it is fixed in 3.0.5.1, we will shift to it. One more question, If instead of truncating table, we remove some rows, then are the lucene documents and indexes for those rows deleted?

Re: MigrationManager.java:164 - Migration task failed to complete

2016-05-17 Thread Alain RODRIGUEZ
There is not that much context here, so I will do a standard answer too. If you have a doubt regarding the data owned by a node, running repair takes some resources but should never break anything. I mean it is an operation you can be running as much as you want. So I would use it, just in case.

Re: Bloom filter memory usage disparity

2016-05-17 Thread Alain RODRIGUEZ
Hi, we would need more information here (if you did not solve it yet). What is your Cassandra version? Does this 3 node cluster use a Replication Factor of 3? Did you change the bloom_filter_fp_chance recently? That table has about 16M keys and 140GB of data. > Is that the total value or per nod

Re: Cassandra Debian repos (Apache vs DataStax)

2016-05-17 Thread Eric Evans
On Mon, May 16, 2016 at 5:19 PM, Drew Kutcharian wrote: > > What’s the difference between the two “Community” repositories Apache > (http://www.apache.org/dist/cassandra/debian) and DataStax > (http://debian.datastax.com/community/)? Good question. All I can tell you is that the Apache reposit

Re: SS Table File Names not containing GUIDs

2016-05-17 Thread Alain RODRIGUEZ
Hi, I am wondering if there is any reason as to why the SS Table format doesn’t > have a GUID I don't know for sure, but what I can say is that GUID is often used to solve the incremental issue on distributed system. SSTables are store on one node, so increment works. So I would say this worked

Restoring Incremental Backups without using sstableloader

2016-05-17 Thread Ravi Teja A V
Hi everyone I am currently working with Cassandra 3.5. I would like to know if it is possible to restore backups without using sstableloader. I have been referring to the following pages in the datastax documentation: https://docs.datastax.com/en/cassandra/3.x/cassandra/operations/opsBackupSnapsho

Re: Why simple replication strategy for system_auth ?

2016-05-17 Thread Jérôme Mainaud
Thank you for your answer. What I still don't understand is why auth data is not managed in the same way as schema metadata. Both must be accessible to the node to do the job. Both are changed very rarely. In a way users are some kind of database objects. I understand the choice for trace and rep

Re: Repair schedules for new clusters

2016-05-17 Thread Ben Slater
We’ve found with incremental repairs that more frequent repairs are generally better. Our current standard for incremental repairs is once per day. I imagine that the exact optimum frequency is dependant on the ratio of reads to write in your cluster. Turning on incremental repairs from the get-go