Cassandra/Spark failing to process large table

2018-03-02 Thread Faraz Mateen
Hi everyone, I am trying to use spark to process a large cassandra table (~402 million entries and 84 columns) but I am getting inconsistent results. Initially the requirement was to copy some columns from this table to another table. After copying the data, I noticed that some entries in the

Re: Whch version is the best version to run now?

2018-03-02 Thread Jeff Jirsa
I’d personally be willing to run 3.0.16 3.11.2 or 3 whatever should also be similar, but I haven’t personally tested it at any meaningful scale -- Jeff Jirsa > On Mar 2, 2018, at 2:37 PM, Kenneth Brotman > wrote: > > Seems like a lot of people are running

Re: Secondary Index Cleanup

2018-03-02 Thread malte
We use 3.11.0 on Linux. What's the C* version do you use? Sounds like the secondary index is very out of sync with the parent cf. On Fri, Mar 2, 2018 at 6:23 AM, Malte Krüger wrote: hi, we have an CF which is about 2 gb in size, it has a seondary index on

Re: Cluster with 3 nodes - Slow performance

2018-03-02 Thread Javier Pareja
Thank you Jürgen, The default consistency in the library in already ONE. I tried setting it anyways but it made no difference. Hopefully it is a configuration issue, that would be very good news!! Do you have any past/present experience with large counter tables? F Javier Pareja On Fri, Mar 2,

Re: Cluster with 3 nodes - Slow performance

2018-03-02 Thread Javier Pareja
Hi Alain, Thank you for your reply. I have run the same test but writing into a non-counter table equivalent to the counter one. The rate in this case is around 160k writes/second. I am not sure if I should be expecting much more in a 3 nodes cluster. In terms of I/O is not much really and the

Re: Whch version is the best version to run now?

2018-03-02 Thread Joaquin Casares
Hello Kenneth, We've been recommending 3.11.2 for new projects since it comes with quite a few performance improvements and bug fixes not available in 3.0.16. Upgrading to newer versions is taken on a case-by-case basis, but if 2.x is pretty important to the customer, 2.2.12 is our go-to

Whch version is the best version to run now?

2018-03-02 Thread Kenneth Brotman
Seems like a lot of people are running old versions of Cassandra. What is the best version, most reliable stable version to use now? Kenneth Brotman

RE: On a 12-node Cluster, Starting C* on a Seed Node Increases ReadLatency from 150ms to 1.5 sec.

2018-03-02 Thread Fd Habash
I understand you use Apache Cassandra 2.2.8. :) - Yes. It was a typo In Apache Cassandra 2.2.8, this triggers incremental repairs I believe, - Yes, default as of 2.2 and using primary range which repairs runs on every node in the cluster . Did you replace the node in-place? - Yes. We removed

Re: Secondary Index Cleanup

2018-03-02 Thread Dikang Gu
What's the C* version do you use? Sounds like the secondary index is very out of sync with the parent cf. On Fri, Mar 2, 2018 at 6:23 AM, Malte Krüger wrote: > hi, > > we have an CF which is about 2 gb in size, it has a seondary index on one > field (UUID). > >

Re: Cluster with 3 nodes - Slow performance

2018-03-02 Thread Alain RODRIGUEZ
Hi Javier, The only bottleneck in the writes as far as I understand it is the commit > log. > Sadly this is somewhat wrong, specially in your case. CPU, network limits can be reached, and other issues, can happen. Plus in your case, using counters, there is way more things involved. The 2 main

Re: Cluster with 3 nodes - Slow performance

2018-03-02 Thread Jürgen Albersdorfer
As far as I have seen, you have not configured outbound consistency which defaults to Local_Quorum. Try with ONE. Then there might still be a configurstion issue. Concurrent compactors maybe or ressource contention on cpu with the Test Code. Von meinem iPhone gesendet > Am 02.03.2018 um 18:36

Re: Cluster with 3 nodes - Slow performance

2018-03-02 Thread Javier Pareja
Hi again, Two more thoughts with respect to my question: - I have configured all 3 nodes to act as seeds but I don't think this affects write performance. - The hints_directory and the saved_caches_directory use the same drive as the commitlog_directory. The data is in the other 7 drives as I

Re: On a 12-node Cluster, Starting C* on a Seed Node Increases Read Latency from 150ms to 1.5 sec.

2018-03-02 Thread Alain RODRIGUEZ
Hello, This is a 2.8.8. cluster That's an exotic version! I understand you use Apache Cassandra 2.2.8. :) This single node was a seed node and it was running a ‘repair -pr’ at the > time In Apache Cassandra 2.2.8, this triggers incremental repairs I believe, and they are relatively (some

On a 12-node Cluster, Starting C* on a Seed Node Increases Read Latency from 150ms to 1.5 sec.

2018-03-02 Thread Fd Habash
This is a 2.8.8. cluster with three AWS AZs, each with 4 nodes. Few days ago we noticed a single node’s read latency reaching 1.5 secs there was 8 others with read latencies going up near 900 ms. This single node was a seed node and it was running a ‘repair -pr’ at the time. We intervened as

Secondary Index Cleanup

2018-03-02 Thread Malte Krüger
hi, we have an CF which is about 2 gb in size, it has a seondary index on one field (UUID). the index has a size on disk of about 10 gb. it only shrinks a little when forcing a compaction through jmx. if i use sstabledump i see a lot of these:     "partition" : {   "key" : [

Cluster with 3 nodes - Slow performance

2018-03-02 Thread Javier Pareja
Hello everyone, I have configured a Cassandra cluster with 3 nodes, however I am not getting the write speed that I was expecting. I have tested against a counter table because it is the bottleneck of the system. So with the system iddle I run the attached sample code (very simple async writes

RE: failing GOSSIP on localhost flooding the debug log

2018-03-02 Thread Kenneth Brotman
Way to go Marco! From: Marco Giovannini [mailto:usern...@gmail.com] Sent: Friday, March 02, 2018 2:45 AM To: Nicolas Guyomar Cc: user@cassandra.apache.org Subject: Re: failing GOSSIP on localhost flooding the debug log CASSANDRA-14285

Re: failing GOSSIP on localhost flooding the debug log

2018-03-02 Thread Marco Giovannini
CASSANDRA-14285 Regards, Marco On Fri, Mar 2, 2018 at 11:33 AM, Marco Giovannini wrote: > Hi, > > I'll use your code to fill up a Jira ticket. > > Regards, > Marco > > On Fri, Mar 2, 2018 at 11:26 AM, Marco Giovannini

Re: failing GOSSIP on localhost flooding the debug log

2018-03-02 Thread Marco Giovannini
Hi, I'll use your code to fill up a Jira ticket. Regards, Marco On Fri, Mar 2, 2018 at 11:26 AM, Marco Giovannini wrote: > Hi > > You morning guess ended up to be right. :) > > Sometimes a couple of fresh eyes are priceless. > > Thanks Nicolas. > > Regards, > Marco > > On

Re: failing GOSSIP on localhost flooding the debug log

2018-03-02 Thread Marco Giovannini
Hi You morning guess ended up to be right. :) Sometimes a couple of fresh eyes are priceless. Thanks Nicolas. Regards, Marco On Fri, Mar 2, 2018 at 11:14 AM, Nicolas Guyomar wrote: > Hi Marco, > > Could that be because your seed list has an extra comma in the end

Re: failing GOSSIP on localhost flooding the debug log

2018-03-02 Thread Nicolas Guyomar
Whoops click "send" to fast In SImpleSeedProvider : String[] hosts = "10.1.20.10,10.1.21.10,10.1.22.10,".split(",", -1); List seeds = new ArrayList(hosts.length); for (String host : hosts) { System.out.println(InetAddress.getByName(host.trim())); } output : /10.1.20.10 /10.1.21.10 /10.1.22.10

Re: failing GOSSIP on localhost flooding the debug log

2018-03-02 Thread Nicolas Guyomar
Hi Marco, Could that be because your seed list has an extra comma in the end of the line, thus being interpreted by default as localhost by Cassandra ? And because you are listening on the node IP localhost is not reachable (need to check to code to be sure) Here => seeds:

failing GOSSIP on localhost flooding the debug log

2018-03-02 Thread Marco Giovannini
Hi, ​ I'm running Cassandra a cluster of 3 nodes on AWS across 3 AZ (every instance has only one interface). Cassandra version is 3.11.1. My debug log get flooded with messages like this one but the cluster work fine. D EBUG [MessagingService-Outgoing-localhost/127.0.0.1-Gossip] 2018-02-28