Are you running repairs ?

You may try:
- increase concurrentçcompaction to 8 (max in 2.1.x)
- increase compaction_throupghput to more than 16MB/s (48 may be a good start)


What kind of data are you storing in theses tables ? timeseries ?



2016-03-21 23:37 GMT+01:00 Gianluca Borello <gianl...@sysdig.com>:
> Thank you for your reply, definitely appreciate the tip on the compressed
> size.
>
> I understand your point, in fact whenever we bootstrap a new node we see a
> huge number of pending compactions (in the order of thousands), and they
> usually decrease steadily until they reach 0 in just a few hours. With this
> node, however, we are way beyond that point, it has been 3 days since the
> number of pending compaction started fluctuating around ~150 without any
> sign of going down (I can see from Opscenter it's almost a straight line
> starting a few hours after the bootstrap). In particular, to reply to your
> point:
>
> - The number of sstables for this CF on this node is around 250, which is in
> the same range of all the other nodes in the cluster (I counted the number
> on each one of them, and every node is in the 200-400 range)
>
> - This theory doesn't seem to explain why, when doing "nodetool drain", the
> compactions completely stop after a few minutes and I get something such as:
>
> $ nodetool compactionstats
> pending tasks: 128
>
> So no compactions being executed (since there is no more write activity),
> but the pending number is still high.
>
> Thanks again
>
>
> On Mon, Mar 21, 2016 at 3:19 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
> wrote:
>>
>> > We added a bunch of new nodes to a cluster (2.1.13) and everything went
>> > fine, except for the number of pending compactions that is staying quite
>> > high on a subset of the new nodes. Over the past 3 days, the pending
>> > compactions have never been less than ~130 on such nodes, with peaks of
>> > ~200.
>>
>> When you bootstrap with Vnodes, you end up with thousands (or tens of
>> thousands) of sstables – with 256 Vnodes (default) * 20 sstables per node,
>> your resulting node will have 5k sstables. It takes quite a while for
>> compaction to chew through that. If you added a bunch of nodes in sequence,
>> you’d have 5k on the first node, then potentailly 10k on the next, and could
>> potentially keep increasing as you start streaming from nodes that have way
>> too many sstables.  This is one of the reasons that many people who have to
>> grow their clusters frequently try not to use vnodes.
>>
>> From your other email:
>>
>> > Also related to this point, now I'm seeing something even more odd: some
>> > compactions are way bigger than the size of the column family itself, such
>> > as:
>>
>> The size reported by compactionstats is the uncompressed size – if you’re
>> using compression, it’s perfectly reasonable for 30G of data to show up as
>> 118G of data during compaction.
>>
>> - Jeff
>>
>> From: Gianluca Borello
>> Reply-To: "user@cassandra.apache.org"
>> Date: Monday, March 21, 2016 at 12:50 PM
>> To: "user@cassandra.apache.org"
>> Subject: Pending compactions not going down on some nodes of the cluster
>>
>> Hi,
>>
>> We added a bunch of new nodes to a cluster (2.1.13) and everything went
>> fine, except for the number of pending compactions that is staying quite
>> high on a subset of the new nodes. Over the past 3 days, the pending
>> compactions have never been less than ~130 on such nodes, with peaks of
>> ~200. On the other nodes, they correctly fluctuate between 0 and ~20, which
>> has been our norm for a long time.
>>
>> We are quite paranoid about pending compactions because in the past such
>> high number caused a lot of data being brought in memory during some reads
>> and that triggered a chain reaction of full GCs that brought down our
>> cluster, so we try to monitor them closely.
>>
>> Some data points that should let the situation speak for itself:
>>
>> - We use LCS for all our column families
>>
>> - The cluster is operating absolutely fine and seems healthy, and every
>> node is handling pretty much the same load in terms of reads and writes.
>> Also, these nodes with higher pending compactions don't seem in any way
>> performing worse than the others
>>
>> - The pending compactions don't go down even when setting the compaction
>> throughput to unlimited for a very long time
>>
>> - This is the typical output of compactionstats and tpstats:
>>
>> $ nodetool compactionstats
>> pending tasks: 137
>>    compaction type   keyspace            table     completed         total
>> unit   progress
>>         Compaction     draios   message_data60    6111208394    6939536890
>> bytes     88.06%
>>         Compaction     draios    message_data1   26473390790   37243294809
>> bytes     71.08%
>> Active compaction remaining time :        n/a
>>
>> $ nodetool tpstats
>> Pool Name                    Active   Pending      Completed   Blocked
>> All time blocked
>> CounterMutationStage              0         0              0         0
>> 0
>> ReadStage                         1         0      111766844         0
>> 0
>> RequestResponseStage              0         0      244259493         0
>> 0
>> MutationStage                     0         0      163268653         0
>> 0
>> ReadRepairStage                   0         0        8933323         0
>> 0
>> GossipStage                       0         0         363003         0
>> 0
>> CacheCleanupExecutor              0         0              0         0
>> 0
>> AntiEntropyStage                  0         0              0         0
>> 0
>> MigrationStage                    0         0              2         0
>> 0
>> Sampler                           0         0              0         0
>> 0
>> ValidationExecutor                0         0              0         0
>> 0
>> CommitLogArchiver                 0         0              0         0
>> 0
>> MiscStage                         0         0              0         0
>> 0
>> MemtableFlushWriter               0         0          32644         0
>> 0
>> MemtableReclaimMemory             0         0          32644         0
>> 0
>> PendingRangeCalculator            0         0            527         0
>> 0
>> MemtablePostFlush                 0         0          36565         0
>> 0
>> CompactionExecutor                2        70         108621         0
>> 0
>> InternalResponseStage             0         0              0         0
>> 0
>> HintedHandoff                     0         0             10         0
>> 0
>> Native-Transport-Requests         6         0      188996929         0
>> 79122
>>
>> Message type           Dropped
>> RANGE_SLICE                  0
>> READ_REPAIR                  0
>> PAGED_RANGE                  0
>> BINARY                       0
>> READ                         0
>> MUTATION                     0
>> _TRACE                       0
>> REQUEST_RESPONSE             0
>> COUNTER_MUTATION             0
>>
>> - If I do a nodetool drain on such nodes, and then wait for a while, the
>> number of pending compactions stays high even if there are no compactions
>> being executed anymore and the node is completely idle:
>>
>> $ nodetool compactionstats
>> pending tasks: 128
>>
>> - It's also interesting to notice how the compaction in the previous
>> example is trying to compact ~37 GB, which is essentially the whole size of
>> the column family message_data1 as reported by cfstats:
>>
>> $ nodetool cfstats -H draios.message_data1
>> Keyspace: draios
>> Read Count: 208168
>> Read Latency: 2.4791508685292647 ms.
>> Write Count: 502529
>> Write Latency: 0.20701542000561163 ms.
>> Pending Flushes: 0
>> Table: message_data1
>> SSTable count: 261
>> SSTables in each level: [43/4, 92/10, 125/100, 0, 0, 0, 0, 0, 0]
>> Space used (live): 36.98 GB
>> Space used (total): 36.98 GB
>> Space used by snapshots (total): 0 bytes
>> Off heap memory used (total): 36.21 MB
>> SSTable Compression Ratio: 0.15461126176169512
>> Number of keys (estimate): 101025
>> Memtable cell count: 229344
>> Memtable data size: 82.4 MB
>> Memtable off heap memory used: 0 bytes
>> Memtable switch count: 83
>> Local read count: 208225
>> Local read latency: 2.479 ms
>> Local write count: 502581
>> Local write latency: 0.208 ms
>> Pending flushes: 0
>> Bloom filter false positives: 11497
>> Bloom filter false ratio: 0.04307
>> Bloom filter space used: 94.97 KB
>> Bloom filter off heap memory used: 92.93 KB
>> Index summary off heap memory used: 57.88 KB
>> Compression metadata off heap memory used: 36.06 MB
>> Compacted partition minimum bytes: 447 bytes
>> Compacted partition maximum bytes: 34.48 MB
>> Compacted partition mean bytes: 1.51 MB
>> Average live cells per slice (last five minutes): 26.269698643294515
>> Maximum live cells per slice (last five minutes): 100.0
>> Average tombstones per slice (last five minutes): 0.0
>> Maximum tombstones per slice (last five minutes): 0.0
>>
>> - There are no warnings or errors in the log, even after a clean restart
>>
>> - Restarting the node doesn't seem to have any effect on the number of
>> pending compactions
>>
>> Any help would be very appreciated.
>>
>> Thank you for reading
>>
>



-- 
Close the World, Open the Net
http://www.linux-wizard.net

Reply via email to