Re: Adding new nodes to cluster to speedup pending compactions

Jonathan Haddad Fri, 27 Apr 2018 11:02:05 -0700

Your compaction time won't improve immediately simply by adding nodes
because the old data still needs to be cleaned up.


What's your end goal?  Why is having a spike in pending compaction tasks
following a massive write an issue?  Are you seeing a dip in performance,
violating an SLA, or do you just not like it?


On Fri, Apr 27, 2018 at 10:54 AM Mikhail Tsaplin <tsmis...@gmail.com> wrote:

> The cluster has 5 nodes of d2.xlarge AWS type (32GB RAM, Attached SSD
> disks), Cassandra 3.0.9.
> Increased compaction throughput from 16 to 200 - active compaction
> remaining time decreased.
> What will happen if another node will join the cluster? - will former
> nodes move part of theirs SSTables to the new node unchanged and compaction
> time will be reduced?
>
>
>
> $ nodetool cfstats -H  dump_es
>
>
> Keyspace: table_b
>         Read Count: 0
>         Read Latency: NaN ms.
>         Write Count: 0
>         Write Latency: NaN ms.
>         Pending Flushes: 0
>                 Table: table_b
>                 SSTable count: 18155
>                 Space used (live): 1.2 TB
>                 Space used (total): 1.2 TB
>                 Space used by snapshots (total): 0 bytes
>                 Off heap memory used (total): 3.62 GB
>                 SSTable Compression Ratio: 0.20371982719658258
>                 Number of keys (estimate): 712032622
>                 Memtable cell count: 0
>                 Memtable data size: 0 bytes
>                 Memtable off heap memory used: 0 bytes
>                 Memtable switch count: 0
>                 Local read count: 0
>                 Local read latency: NaN ms
>                 Local write count: 0
>                 Local write latency: NaN ms
>                 Pending flushes: 0
>                 Bloom filter false positives: 0
>                 Bloom filter false ratio: 0.00000
>                 Bloom filter space used: 2.22 GB
>                 Bloom filter off heap memory used: 2.56 GB
>                 Index summary off heap memory used: 357.51 MB
>                 Compression metadata off heap memory used: 724.97 MB
>                 Compacted partition minimum bytes: 771 bytes
>                 Compacted partition maximum bytes: 1.55 MB
>                 Compacted partition mean bytes: 3.47 KB
>                 Average live cells per slice (last five minutes): NaN
>                 Maximum live cells per slice (last five minutes): 0
>                 Average tombstones per slice (last five minutes): NaN
>                 Maximum tombstones per slice (last five minutes): 0
>
> 2018-04-27 22:21 GMT+07:00 Nicolas Guyomar <nicolas.guyo...@gmail.com>:
>
>> Hi Mikhail,
>>
>> Could you please provide :
>> - your cluster version/topology (number of nodes, cpu, ram available etc)
>> - what kind of underlying storage you are using
>> - cfstat using -H option cause I'm never sure I'm converting bytes=>GB
>>
>> You are storing 1Tb per node, so long running compaction is not really a
>> surprise, you can play with concurrent compaction thread number, compaction
>> throughput to begin with
>>
>>
>> On 27 April 2018 at 16:59, Mikhail Tsaplin <tsmis...@gmail.com> wrote:
>>
>>> Hi,
>>> I have a five nodes C* cluster suffering from a big number of pending
>>> compaction tasks: 1) 571; 2) 91; 3) 367; 4) 22; 5) 232
>>>
>>> Initially, it was holding one big table (table_a). With Spark, I read
>>> that table, extended its data and stored in a second table_b. After this
>>> copying/extending process the number of compaction tasks in the cluster has
>>> grown up. From nodetool cfstats (see output at the bottom): table_a has 20
>>> SSTables and table_b has 18219.
>>>
>>> As I understand table_b has a big SSTables number because data was
>>> transferred from one table to another within a short time and eventually
>>> this tables will be compacted. But now I have to read whole data from this
>>> table_b and send it to Elasticsearch. When Spark reads this table some
>>> Cassandra nodes are dying because of OOM.
>>>
>>> I think that when compaction will be completed - the Spark reading job
>>> will work fine.
>>>
>>> The question is how can I speed up compaction process, what if I will
>>> add another two nodes to cluster - will compaction finish faster? Or data
>>> will be copied to new nodes but compaction will continue on the original
>>> set of SSTables?
>>>
>>>
>>> *Nodetool cfstats output:
>>>
>>>                 Table: table_a
>>>                 SSTable count: 20
>>>                 Space used (live): 1064889308052
>>>                 Space used (total): 1064889308052
>>>                 Space used by snapshots (total): 0
>>>                 Off heap memory used (total): 1118106937
>>>                 SSTable Compression Ratio: 0.12564594959566894
>>>                 Number of keys (estimate): 56238959
>>>                 Memtable cell count: 76824
>>>                 Memtable data size: 115531402
>>>                 Memtable off heap memory used: 0
>>>                 Memtable switch count: 17
>>>                 Local read count: 0
>>>                 Local read latency: NaN ms
>>>                 Local write count: 77308
>>>                 Local write latency: 0.045 ms
>>>                 Pending flushes: 0
>>>                 Bloom filter false positives: 0
>>>                 Bloom filter false ratio: 0.00000
>>>                 Bloom filter space used: 120230328
>>>                 Bloom filter off heap memory used: 120230168
>>>                 Index summary off heap memory used: 2837249
>>>                 Compression metadata off heap memory used: 995039520
>>>                 Compacted partition minimum bytes: 1110
>>>                 Compacted partition maximum bytes: 52066354
>>>                 Compacted partition mean bytes: 133152
>>>                 Average live cells per slice (last five minutes): NaN
>>>                 Maximum live cells per slice (last five minutes): 0
>>>                 Average tombstones per slice (last five minutes): NaN
>>>                 Maximum tombstones per slice (last five minutes): 0
>>>
>>>
>>> nodetool cfstats table_b
>>> Keyspace: dump_es
>>>         Read Count: 0
>>>         Read Latency: NaN ms.
>>>         Write Count: 0
>>>         Write Latency: NaN ms.
>>>         Pending Flushes: 0
>>>                 Table: table_b
>>>                 SSTable count: 18219
>>>                 Space used (live): 1316641151665
>>>                 Space used (total): 1316641151665
>>>                 Space used by snapshots (total): 0
>>>                 Off heap memory used (total): 3863604976
>>> <(386)%20360-4976>
>>>                 SSTable Compression Ratio: 0.20387645535477916
>>>                 Number of keys (estimate): 712032622
>>>                 Memtable cell count: 0
>>>                 Memtable data size: 0
>>>                 Memtable off heap memory used: 0
>>>                 Memtable switch count: 0
>>>                 Local read count: 0
>>>                 Local read latency: NaN ms
>>>                 Local write count: 0
>>>                 Local write latency: NaN ms
>>>                 Pending flushes: 0
>>>                 Bloom filter false positives: 0
>>>                 Bloom filter false ratio: 0.00000
>>>                 Bloom filter space used: 2382971488
>>>                 Bloom filter off heap memory used: 2742320056
>>>                 Index summary off heap memory used: 371500752
>>>                 Compression metadata off heap memory used: 749784168
>>>                 Compacted partition minimum bytes: 771
>>>                 Compacted partition maximum bytes: 1629722
>>>                 Compacted partition mean bytes: 3555
>>>                 Average live cells per slice (last five minutes): 132.375
>>>                 Maximum live cells per slice (last five minutes): 149
>>>                 Average tombstones per slice (last five minutes): 1.0
>>>                 Maximum tombstones per slice (last five minutes): 1
>>>
>>>
>>> ------------------
>>>
>>>
>>> I logged CQL requests going from Spark and checked how one such request
>>> is performing - it fetches 8075rows, 59mb data in 155s (see below check
>>> output)
>>>
>>> $ date; echo 'SELECT "scan_id", "snapshot_id", "scan_doc",
>>> "snapshot_doc" FROM "dump_es"."table_b" WHERE token("scan_id") >
>>> 946122293981930504 AND token("scan_id") <= 946132293981
>>> 930504  ALLOW FILTERING;' | cqlsh --request-timeout=3600 | wc ; date
>>>
>>>
>>> Fri Apr 27 13:32:55 UTC 2018
>>>    8076   61191 59009831
>>> Fri Apr 27 13:35:30 UTC 2018
>>>
>>>
>>>
>>
>

Re: Adding new nodes to cluster to speedup pending compactions

Reply via email to