RE: high context switches

2014-11-24 Thread Jan Karlsson
We use CQL with 1 session per client and default connection settings. I do not think that we are using too many client threads. Number of native transport threads is set to default (max 128). From: Robert Coli [mailto:rc...@eventbrite.com] Sent: den 21 november 2014 19:30 To: user@cassandra.apa

Re: [jira] Akhtar Hussain shared a search result with you

2014-11-24 Thread Akhtar Hussain
This error occurred when we took one node from remote DC down. Our main concern is the *org.apache.cassandra.thrift*.*TimedOutException* exception in our application logs. Why read failed when we used LOCAL_QUORUM. Failure of a node in other DC must not impact our DC if we are using LOCAL_QUORUM. S

Re: Compaction Strategy guidance

2014-11-24 Thread Nikolai Grigoriev
Jean-Armel, I have only two large tables, the rest is super-small. In the test cluster of 15 nodes the largest table has about 110M rows. Its total size is about 1,26Gb per node (total disk space used per node for that CF). It's got about 5K sstables per node - the sstable size is 256Mb. cfstats o

Re: Compaction Strategy guidance

2014-11-24 Thread Andrei Ivanov
Nikolai, Are you sure about 1.26Gb? Like it doesn't look right - 5195 tables with 256Mb table size... Andrei On Mon, Nov 24, 2014 at 5:09 PM, Nikolai Grigoriev wrote: > Jean-Armel, > > I have only two large tables, the rest is super-small. In the test cluster > of 15 nodes the largest table has

Re: Compaction Strategy guidance

2014-11-24 Thread Nikolai Grigoriev
Andrei, Oh, Monday mornings...Tb :) On Mon, Nov 24, 2014 at 9:12 AM, Andrei Ivanov wrote: > Nikolai, > > Are you sure about 1.26Gb? Like it doesn't look right - 5195 tables > with 256Mb table size... > > Andrei > > On Mon, Nov 24, 2014 at 5:09 PM, Nikolai Grigoriev > wrote: > > Jean-Armel, > >

Re: Compaction Strategy guidance

2014-11-24 Thread Andrei Ivanov
Nikolai, This is more or less what I'm seeing on my cluster then. Trying to switch to bigger sstables right now (1Gb) On Mon, Nov 24, 2014 at 5:18 PM, Nikolai Grigoriev wrote: > Andrei, > > Oh, Monday mornings...Tb :) > > On Mon, Nov 24, 2014 at 9:12 AM, Andrei Ivanov wrote: >> >> Nikolai, >> >

Re: Compaction Strategy guidance

2014-11-24 Thread Nikolai Grigoriev
I was thinking about that option and I would be curious to find out how does this change help you. I suspected that increasing sstable size won't help too much because the compaction throughput (per task/thread) is still the same. So, it will simply take 4x longer to finish a compaction task. It is

Re: Compaction Strategy guidance

2014-11-24 Thread Andrei Ivanov
OK, let's see - my cluster is recompacting now;-) I will let you know if this helps On Mon, Nov 24, 2014 at 5:48 PM, Nikolai Grigoriev wrote: > I was thinking about that option and I would be curious to find out how does > this change help you. I suspected that increasing sstable size won't help

Re: Getting the counters with the highest values

2014-11-24 Thread Eric Stevens
You're right that there's no way to use the counter data type to materialize a view ordered by the counter. Computing this post hoc is the way to go if your needs allow for it (if not, something like Summingbird or vanilla Storm may be necessary). I might suggest that you make your primary key fo

Re: Getting the counters with the highest values

2014-11-24 Thread Robert Wille
We do get a large number of documents getting counts each day, which is why I’m thinking the running totals table be ((doc_id), day) rather than ((day), doc_id). We have too many documents per day to materialize in memory, so querying per day and aggregating the results isn’t really possible. I

Re: Repair completes successfully but data is still inconsistent

2014-11-24 Thread André Cruz
On 21 Nov 2014, at 19:01, Robert Coli wrote: > > 2- Why won’t repair propagate this column value to the other nodes? Repairs > have run everyday and the value is still missing on the other nodes. > > No idea. Are you sure it's not expired via TTL or masked in some other way? > When you ask tha

Re: Repair completes successfully but data is still inconsistent

2014-11-24 Thread Robert Coli
On Mon, Nov 24, 2014 at 10:39 AM, André Cruz wrote: > This data does not use TTLs. What other reason could there be for a mask? > If I connect using cassandra-cli to that specific node, which becomes the > coordinator, is it guaranteed to not ask another node when CL is ONE and it > contains that

Re: Compaction Strategy guidance

2014-11-24 Thread Robert Coli
On Mon, Nov 24, 2014 at 6:48 AM, Nikolai Grigoriev wrote: > One of the obvious recommendations I have received was to run more than > one instance of C* per host. Makes sense - it will reduce the amount of > data per node and will make better use of the resources. > This is usually a Bad Idea to

What causes NoHostAvailableException, WriteTimeoutException, and UnavailableException?

2014-11-24 Thread Kevin Burton
I’m trying to track down some exceptions in our production cluster. I bumped up our write load and now I’m getting a non-trivial number of these exceptions. Somewhere on the order of 100 per hour. All machines have a somewhat high CPU load because they’re doing other tasks. I’m worried that per

Re: What causes NoHostAvailableException, WriteTimeoutException, and UnavailableException?

2014-11-24 Thread Bulat Shakirzyanov
Check out Ruby Driver documentation on these topics: Error Handling Retry Policies While the documentation is for the Ruby Driver, the concepts were borrowed from and

Re: What causes NoHostAvailableException, WriteTimeoutException, and UnavailableException?

2014-11-24 Thread Shane Hansen
Not sure if this is what you're looking for, but api docs can be useful (I won't copy/paste the docs themselves) http://www.datastax.com/drivers/java/2.0/com/datastax/driver/core/exceptions/NoHostAvailableException.html http://www.datastax.com/drivers/java/2.0/com/datastax/driver/core/exceptions/

Re: What causes NoHostAvailableException, WriteTimeoutException, and UnavailableException?

2014-11-24 Thread Robert Coli
On Mon, Nov 24, 2014 at 12:57 PM, Kevin Burton wrote: > I’m trying to track down some exceptions in our production cluster. I > bumped up our write load and now I’m getting a non-trivial number of these > exceptions. Somewhere on the order of 100 per hour. > > All machines have a somewhat high

Re: What causes NoHostAvailableException, WriteTimeoutException, and UnavailableException?

2014-11-24 Thread Parag Shah
In our case, the timeouts were happening because internode authentication was turned on and by default the user column family in the system_auth keyspace is replicated only on 1 node. We also had to tune the permissions_validity_in_ms from the default of 2000 ms to a larger value. The issue was

Cassandra version 1.0.10 Data Loss upon restart

2014-11-24 Thread Ankit Patel
We are experiencing data loss with Cassandra 1.0.10 when we had restarted the without flushing. We see in the cassandra logs that the commitlogs were read back without any problems. Until the restart the data was correct. However, after the node restarted we retrieved older version of the data (row

large range read in Cassandra

2014-11-24 Thread Dan Kinder
Hi, We have a web crawler project currently based on Cassandra ( https://github.com/iParadigms/walker, written in Go and using the gocql driver), with the following relevant usage pattern: - Big range reads over a CF to grab potentially millions of rows and dispatch new links to crawl - Fast inse

Re: large range read in Cassandra

2014-11-24 Thread Robert Coli
On Mon, Nov 24, 2014 at 4:26 PM, Dan Kinder wrote: > We have a web crawler project currently based on Cassandra ( > https://github.com/iParadigms/walker, written in Go and using the gocql > driver), with the following relevant usage pattern: > > - Big range reads over a CF to grab potentially mil

Re: Cassandra version 1.0.10 Data Loss upon restart

2014-11-24 Thread Robert Coli
On Mon, Nov 24, 2014 at 5:51 PM, Robert Coli wrote: > What is your replication factor? What CL are you using to read? > Ah, I see from OP that RF is 1. As a general statement, RF=1 is an edge case which very, very few people have ever operated in production. It is relatively likely that there a

Re: Cassandra version 1.0.10 Data Loss upon restart

2014-11-24 Thread Robert Coli
On Mon, Nov 24, 2014 at 3:19 PM, Ankit Patel wrote: > We are experiencing data loss with Cassandra 1.0.10 when we had restarted > the without flushing. We see in the cassandra logs that the commitlogs were > read back without any problems. Until the restart the data was correct. > However, after

Re: What causes NoHostAvailableException, WriteTimeoutException, and UnavailableException?

2014-11-24 Thread Robert Coli
On Mon, Nov 24, 2014 at 3:01 PM, Parag Shah wrote: > In our case, the timeouts were happening because internode > authentication was turned on and by default the user column family in the > system_auth keyspace is replicated only on 1 node. We also had to tune the > permissions_validity_in_ms fr

Re: Problem with performance, memory consumption, and RLIMIT_MEMLOCK

2014-11-24 Thread Dmitri Dmitrienko
Hi Jens, I solved the problem by switching to PAGING mode. In this case it works smooth and does not require so many locks. It was not clear in the beginning and the only sample demonstrated corresponding API (functions like cass_result_has_more_pages()) is "paging". Hope this helps somebody. On