Re: [EXTERNAL] fine tuning for wide rows and mixed worload system

Marco Gasparini Fri, 11 Jan 2019 07:51:19 -0800

Hi Sean,

> I will start – knowing that others will have additional help/questions
I hope that, I really need help with this :)


> What heap size are you using? Sounds like you are using the CMS garbage
collector.

Yes, I'm using CMS garbage Collector. I have not used G1 because I read it
isn't recommended but if you are saying that is going to help me with my
use case I have no objection in using it. I will try.
I have 3 nodes: node1 has 32GB and node2 and node3 16 GB. I'm currently
using 50% RAM for each node.


> Spinning disks are a problem, too. Can you tell if the IO is getting
overwhelmed? SSDs are much preferred.

I'm not sure about it, 'dstat' and 'iostat' tell me that rMB/s is
constantly above 100MB/s and %util is closed to 100% and in these
conditions the node is frozen.
HDD specifics says that maximum transfer rate is 175MB/s for node1 and
155MB/s for node2 and node3.
Unfortunately switching to spinning disk to SSD is not an option.



> Read before write is usually an anti-pattern for Cassandra. From your
queries, it seems you have a partition key and clustering key.
Can you give us the table schema? I’m also concerned about the IF EXISTS in
your delete.
I think that invokes a light weight transaction – costly for performance.
Is it really required for your use case?

I don't need the 'IF EXISTS' parameter. Actually is pretty much a refuse
from an old query and I can try to remove this.

Here the schema:

CREATE KEYSPACE my_keyspace WITH replication = {'class':
'NetworkTopologyStrategy', 'DC1': '3'}  AND durable_writes = false;
CREATE TABLE my_keyspace.my_table (
    pkey text,
    event_datetime timestamp,
    f1 text,
    f2 text,
    f3 text,
    f4 text,
    f5 int,
    f6 bigint,
    f7 bigint,
    f8 text,
    f9 text,
    PRIMARY KEY (pkey, event_datetime)
) WITH CLUSTERING ORDER BY (event_datetime DESC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class':
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class':
'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 90000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';


Thank you very much
Marco

Il giorno ven 11 gen 2019 alle ore 16:14 Durity, Sean R <
sean_r_dur...@homedepot.com> ha scritto:

> I will start – knowing that others will have additional help/questions.
>
>
>
> What heap size are you using? Sounds like you are using the CMS garbage
> collector. That takes some arcane knowledge and lots of testing to tune. I
> would start with G1 and using ½ the available RAM as the heap size. I would
> want 32 GB RAM as a minimum on the hosts.
>
>
>
> Spinning disks are a problem, too. Can you tell if the IO is getting
> overwhelmed? SSDs are much preferred.
>
>
>
> Read before write is usually an anti-pattern for Cassandra. From your
> queries, it seems you have a partition key and clustering key. Can you give
> us the table schema? I’m also concerned about the IF EXISTS in your delete.
> I think that invokes a light weight transaction – costly for performance.
> Is it really required for your use case?
>
>
>
>
>
> Sean Durity
>
>
>
> *From:* Marco Gasparini <marco.gaspar...@competitoor.com>
> *Sent:* Friday, January 11, 2019 8:20 AM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] fine tuning for wide rows and mixed worload system
>
>
>
> Hello everyone,
>
>
>
> I need some advise in order to solve my use case problem. I have already
> tried some solutions but it didn't work out.
>
> Can you help me with the following configuration please? any help is very
> appreciate
>
>
>
> I'm using:
>
> - Cassandra 3.11.3
>
> - java version "1.8.0_191"
>
>
>
> My use case is composed by the following constraints:
>
> - about 1M reads per day (it is going to rise up)
>
> - about 2M writes per day (it is going to rise up)
>
> - there is a high peek of requests in less than 2 hours in which the
> system receives half of all day traffic (500K reads, 1M writes)
>
> - each request is composed by 1 read and 2 writes (1 delete + 1 write)
>
>
>
>             * the read query selects max 3 records based on the primary
> key (select * from my_keyspace.my_table where pkey = ? limit 3)
>
>             * then is performed a deletion of one record (delete from
> my_keyspace.my_table where pkey = ? and event_datetime = ? IF EXISTS)
>
>             * finally the new data is stored (insert into
> my_keyspace.my_table (event_datetime, pkey, agent, some_id, ft, ftt..)
> values (?,?,?,?,?,?...))
>
>
>
> - each row is pretty wide. I don't really know the exact size because
> there are 2 dynamic text columns that stores data between 1MB to 50MB
> length each.
>
>   So, reads are going to be huge because I read 3 records of that
> dimension every time. Writes are complex as well because each row is that
> wide.
>
>
>
> Currently, I own 3 nodes with the following properties:
>
> - node1:
>
>             * Intel Core i7-3770
>
>             * 2x HDD SATA 3,0 TB
>
>             * 4x RAM 8192 MB DDR3
>
>             * nominative bit rate 175MB/s
>
>             # blockdev --report /dev/sd[ab]
>
>                         RO    RA   SSZ   BSZ   StartSec            Size
>  Device
>
>                         rw   256   512  4096          0   3000592982016
>  /dev/sda
>
>                         rw   256   512  4096          0   3000592982016
>  /dev/sdb
>
>
>
> - node2,3:
>
>             * Intel Core i7-2600
>
>             * 2x HDD SATA 3,0 TB
>
>             * 4x RAM 4096 MB DDR3
>
>             * nominative bit rate 155MB/s
>
>             # blockdev --report /dev/sd[ab]
>
>                         RO    RA   SSZ   BSZ   StartSec            Size
>  Device
>
>                         rw   256   512  4096          0   3000592982016
>  /dev/sda
>
>                         rw   256   512  4096          0   3000592982016
>  /dev/sdb
>
>
>
> Each node has 2 disks but I have disabled RAID option and I have created a
> virtual single disk in order to get much free space.
>
> Can this configuration create issues?
>
>
>
> I have already tried some configurations in order to make it work, like:
>
> 1) straigthforward attempt
>
>             - default Cassandra configuration (cassandra.yaml)
>
>             - RF=1
>
>             - SizeTieredCompactionStrategy  (write strategy)
>
>             - no row cache (because of wide rows dimension is better to
> have no row cache)
>
>             - gc_grace_seconds = 1 day (unfortunately, I did no repair
> schedule at all)
>
>             results:
>
>                         too many timeouts, losing data
>
>
>
> 2)
>
>             - added repair schedules
>
>             - RF=3 (in order increase reads speed)
>
>             results:
>
>                         - too many timeouts, losing data
>
>                         - high I/O consumption on each nodes (iostat shows
> 100% in %util on each nodes, dstat shows hundred of M read for each
> iteration)
>
>                         - node2 frozen until I stopped data writes.
>
>                         - node3 almost frozen
>
>                         - many panding MutationStage events in TPSTATS in
> node2
>
>                         - many full GC
>
>                         - many HintsDispatchExecutor events in system.log
>
>
>
> actual)
>
>             - added repair schedules
>
>             - RF=3
>
>             - set durable_writes = false in order to speed up writes
>
>             - increased young heap
>
>             - decreased SurviviorRatio in order to get much young size
> available because of wide rows data
>
>             - increased from 1 to 3 MaxTenuringThreshold in order to
> decrease reads latency
>
>             - increased Cassandra's memtable onheap and offheap dimensions
> beacause of wide rows data
>
>             - changed memtable_allocation_type to offheap_objects bacause
> of wide rows data
>
>             results:
>
>                         - better GC performance on nodes1 and
> node3
>
>                         - still high I/O consumption on each nodes (iostat
> shows 100% in %util on each nodes, dstat shows hundred of M read for each
> iteration)
>
>                         - still node2 completely frozen
>
>                         - many panding MutationStage events in TPSTATS in
> node2
>
>                         - many HintsDispatchExecutor events in system.log
> in each nodes
>
>
>
>
>
> I cannot go to AWS but I can only get dedicated server.
>
> Do you have any suggestions to fine tune the system on this use case?
>
>
>
> Thank you
>
> Marco
>
>
>
> ------------------------------
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>

Re: [EXTERNAL] fine tuning for wide rows and mixed worload system

Reply via email to