Thank you for your reply. To answer your points:
- I fully agree on the write volume, in fact my isolated tests confirm your estimation - About the read, I agree as well, but the volume of data is still much higher - I am writing to one single keyspace with RF 3, there's just one keyspace - I am not using any indexes, the column families are very simple - I am aware of the double count, in fact, I measured the traffic on port 9042 at the client side (so just counted once) and I divided by two the traffic on port 7000 as measured on each node (35 GB -> 17.5 GB). All the measurements have been done with iftop with proper bpf filters on the port and the total traffic matches what I see in cloudwatch (divided by two) So unfortunately I still don't have any ideas about what's going on and why I'm seeing 17 GB of internode traffic instead of ~ 5-6. On Thursday, February 25, 2016, daemeon reiydelle <daeme...@gmail.com> wrote: > If read & write at quorum then you write 3 copies of the data then return > to the caller; when reading you read one copy (assume it is not on the > coordinator), and 1 digest (because read at quorum is 2, not 3). > > When you insert, how many keyspaces get written to? (Are you using e.g. > inverted indices?) That is my guess, that your db has about 1.8 bytes > written for every byte inserted. > > Every byte you write is counted also as a read (system a sends 1gb to > system b, so system b receives 1gb). You would not be charged if intra AZ, > but inter AZ and inter DC will get that double count. > > So, my guess is reverse indexes, and you forgot to include receive and > transmit. > > > > *.......* > > > > *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872* > > On Thu, Feb 25, 2016 at 6:51 PM, Gianluca Borello <gianl...@sysdig.com > <javascript:_e(%7B%7D,'cvml','gianl...@sysdig.com');>> wrote: > >> Hello, >> >> We have a Cassandra 2.1.9 cluster on EC2 for one of our live >> applications. There's a total of 21 nodes across 3 AWS availability zones, >> c3.2xlarge instances. >> >> The configuration is pretty standard, we use the default settings that >> come with the datastax AMI and the driver in our application is configured >> to use lz4 compression. The keyspace where all the activity happens has RF >> 3 and we read and write at quorum to get strong consistency. >> >> While analyzing our monthly bill, we noticed that the amount of network >> traffic related to Cassandra was significantly higher than expected. After >> breaking it down by port, it seems like over any given time, the internode >> network activity is 6-7 times higher than the traffic on port 9042, whereas >> we would expect something around 2-3 times, given the replication factor >> and the consistency level of our queries. >> >> For example, this is the network traffic broken down by port and >> direction over a few minutes, measured as sum of each node: >> >> Port 9042 from client to cluster (write queries): 1 GB >> Port 9042 from cluster to client (read queries): 1.5 GB >> Port 7000: 35 GB, which must be divided by two because the traffic is >> always directed to another instance of the cluster, so that makes it 17.5 >> GB generated traffic >> >> The traffic on port 9042 completely matches our expectations, we do about >> 100k write operations writing 10KB binary blobs for each query, and a bit >> more reads on the same data. >> >> According to our calculations, in the worst case, when the coordinator of >> the query is not a replica for the data, this should generate about (1 + >> 1.5) * 3 = 7.5 GB, and instead we see 17 GB, which is quite a lot more. >> >> Also, hinted handoffs are disabled and nodes are healthy over the period >> of observation, and I get the same numbers across pretty much every time >> window, even including an entire 24 hours period. >> >> I tried to replicate this problem in a test environment so I connected a >> client to a test cluster done in a bunch of Docker containers (same >> parameters, essentially the only difference is the >> GossipingPropertyFileSnitch instead of the EC2 one) and I always get what I >> expect, the amount of traffic on port 7000 is between 2 and 3 times the >> amount of traffic on port 9042 and the queries are pretty much the same >> ones. >> >> Before doing more analysis, I was wondering if someone has an explanation >> on this problem, since perhaps we are missing something obvious here? >> >> Thanks >> >> >> >