Tombstones should eventually compact away in most cases, but if you've recently changed topology (added nodes, removed nodes, etc), you should run "nodetool cleanup" to remove no-longer-owned data (start by running it on one instance at a time, it's a form of compaction and can impact disk space and latencies).
On Mon, Aug 7, 2017 at 2:04 PM, Chuck Reynolds <creyno...@ancestry.com> wrote: > Yes it’s the total size. > > > > Could it be that tombstones or data that nodes no longer own is not being > copied/streamed to the data center in AWS? > > > > *From: *Jeff Jirsa <jji...@gmail.com> > *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org> > *Date: *Monday, August 7, 2017 at 2:51 PM > *To: *cassandra <user@cassandra.apache.org> > *Subject: *Re: Different data size between datacenters > > > > And when you say the data size is smaller, you mean per node? Or sum of > all nodes in the datacenter? > > > > With 185 hosts in AWS vs 135 in your DC, I would expect your DC hosts to > have 30% less data per host than AWS. > > > > If instead they have twice as much, it sounds like it's balancing by # of > tokens instead, which may be an indication that you're somehow using > SimpleStrategy, or your NetworkTopologyStrategy is somehow misconfigured > for one or more keyspaces. > > > > Can you paste your keyspace replication strategy lines, anonymized as > needed? > > > > > > On Mon, Aug 7, 2017 at 1:46 PM, Chuck Reynolds <creyno...@ancestry.com> > wrote: > > Yes to the NetworkTopologyStrategy. > > > > *From: *Jeff Jirsa <jji...@gmail.com> > *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org> > *Date: *Monday, August 7, 2017 at 2:39 PM > *To: *cassandra <user@cassandra.apache.org> > *Subject: *Re: Different data size between datacenters > > > > You're using NetworkTopologyStrategy and not SimpleStrategy, correct? > > > > > > On Mon, Aug 7, 2017 at 11:50 AM, Chuck Reynolds <creyno...@ancestry.com> > wrote: > > I have a cluster that spans two datacenters running Cassandra 2.1.12. 135 > nodes in my data center and about 185 in AWS. > > > > The size of the second data center (AWS) is quite a bit smaller. > Replication is the same in both datacenters. Is there a logical > explanation for this? > > > > thanks > > > > >