If read & write at quorum then you write 3 copies of the data then return
to the caller; when reading you read one copy (assume it is not on the
coordinator), and 1 digest (because read at quorum is 2, not 3).

When you insert, how many keyspaces get written to? (Are you using e.g.
inverted indices?) That is my guess, that your db has about 1.8 bytes
written for every byte inserted.

​Every byte you write is counted also as a read (system a sends 1gb to
system b, so system b receives 1gb). You would not be charged if intra AZ,
but inter AZ and inter DC will get that double count.

So, my guess is reverse indexes, and you forgot to include receive and
transmit.​
​


*.......*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Thu, Feb 25, 2016 at 6:51 PM, Gianluca Borello <gianl...@sysdig.com>
wrote:

> Hello,
>
> We have a Cassandra 2.1.9 cluster on EC2 for one of our live applications.
> There's a total of 21 nodes across 3 AWS availability zones, c3.2xlarge
> instances.
>
> The configuration is pretty standard, we use the default settings that
> come with the datastax AMI and the driver in our application is configured
> to use lz4 compression. The keyspace where all the activity happens has RF
> 3 and we read and write at quorum to get strong consistency.
>
> While analyzing our monthly bill, we noticed that the amount of network
> traffic related to Cassandra was significantly higher than expected. After
> breaking it down by port, it seems like over any given time, the internode
> network activity is 6-7 times higher than the traffic on port 9042, whereas
> we would expect something around 2-3 times, given the replication factor
> and the consistency level of our queries.
>
> For example, this is the network traffic broken down by port and direction
> over a few minutes, measured as sum of each node:
>
> Port 9042 from client to cluster (write queries): 1 GB
> Port 9042 from cluster to client (read queries): 1.5 GB
> Port 7000: 35 GB, which must be divided by two because the traffic is
> always directed to another instance of the cluster, so that makes it 17.5
> GB generated traffic
>
> The traffic on port 9042 completely matches our expectations, we do about
> 100k write operations writing 10KB binary blobs for each query, and a bit
> more reads on the same data.
>
> According to our calculations, in the worst case, when the coordinator of
> the query is not a replica for the data, this should generate about (1 +
> 1.5) * 3 = 7.5 GB, and instead we see 17 GB, which is quite a lot more.
>
> Also, hinted handoffs are disabled and nodes are healthy over the period
> of observation, and I get the same numbers across pretty much every time
> window, even including an entire 24 hours period.
>
> I tried to replicate this problem in a test environment so I connected a
> client to a test cluster done in a bunch of Docker containers (same
> parameters, essentially the only difference is the
> GossipingPropertyFileSnitch instead of the EC2 one) and I always get what I
> expect, the amount of traffic on port 7000 is between 2 and 3 times the
> amount of traffic on port 9042 and the queries are pretty much the same
> ones.
>
> Before doing more analysis, I was wondering if someone has an explanation
> on this problem, since perhaps we are missing something obvious here?
>
> Thanks
>
>
>

Reply via email to