If you use TWCS with TTL, the old SSTables won't be compacted, the entire SSTable file will get dropped after it expires. I don't think you will need to manage the compaction or cleanup at all, as they are automatic. There's no space limit on the table holding the near-term data other than the overall free disk space. There's only a time limit on that table.

On 17/09/2021 16:51, Isaeed Mohanna wrote:

Thank for the help,

How does the compaction run? Does it clean old compaction files while running or only at the end, I want to manage the free space so not run out while its running?

*From:* Jim Shaw <jxys...@gmail.com>
*Sent:* Wednesday, September 15, 2021 3:49 PM
*To:* user@cassandra.apache.org
*Subject:* Re: TWCS on Non TTL Data

You may try roll up the data, i.e.  a table only 1 month data, old data roll up to a table keep a year data.

Thanks,

Jim

On Wed, Sep 15, 2021 at 1:26 AM Isaeed Mohanna <isa...@xsense.co <mailto:isa...@xsense.co>> wrote:

    My cluster column is the time series timestamp, so basically
    sourceId, metric type for partition key and timestamp for the
    clustering key the rest of the fields are just values outside of
    the primary key. Our reads request are simply give me values for a
    time range of a specific sourceId,Metric combination. So I am
    guess that during read the sstables that contain the partition key
    will be found and out of those the ones that are out of the range
    will be excluded, correct?

    In practice our queries are up to a month by default, only rarely
    we fetch more when someone is exporting the data or so.

    In reality also we get old data, that is a source will send its
    information late instead of sending it in realtime it will send
    all last month\week\day data at once, in that case I guess the
    data will end up in current bucket, will that affect performance?

    Assuming I start with a  1 week bucket, I could later change the
    time window right?

    Thanks

    *From:* Jeff Jirsa <jji...@gmail.com <mailto:jji...@gmail.com>>
    *Sent:* Tuesday, September 14, 2021 10:35 PM
    *To:* cassandra <user@cassandra.apache.org
    <mailto:user@cassandra.apache.org>>
    *Subject:* Re: TWCS on Non TTL Data

    Inline

    On Tue, Sep 14, 2021 at 11:47 AM Isaeed Mohanna <isa...@xsense.co
    <mailto:isa...@xsense.co>> wrote:

        Hi Jeff

        My data is partitioned by a sourceId and metric, a source is
        usually active up to a year after which there is no additional
        writes for the partition, and reads become scarce, so although
        this is not an explicit time component, its time based, will
        that suffice?

    I guess it means that a single read may touch a year of sstables.
    Not great, but perhaps not fatal. Hopefully your reads avoid that
    in practice. We'd need the full schema to be very sure (does
    clustering column include month/day? if so, there are cases where
    that can help exclude sstables)

        If I use a  week bucket we will be able to serve last few days
        reads from one file and last month from ~5 which is the most
        common queries, do u think doing a months bucket a good idea?
        That will allow reading from one file most of the time but the
        size of each SSTable will be ~5 times bigger

    It'll be 1-4 for most common (up to 4 for same bucket reads
    because STCS in the first bucket is triggered at min_threshold=4),
    and 5 max, seems reasonable. Way better than the 200 or so you're
    doing now.

        When changing the compaction strategy via JMX, do I need to
        issue the alter table command at the end so it will be
        reflected in the schema or is it taking care of automatically?
        (I am using cassandra 3.11.11)

    At the end, yes.

        Thanks a lot for your help.

        *From:* Jeff Jirsa <jji...@gmail.com <mailto:jji...@gmail.com>>
        *Sent:* Tuesday, September 14, 2021 4:51 PM
        *To:* cassandra <user@cassandra.apache.org
        <mailto:user@cassandra.apache.org>>
        *Subject:* Re: TWCS on Non TTL Data

        On Tue, Sep 14, 2021 at 5:42 AM Isaeed Mohanna
        <isa...@xsense.co <mailto:isa...@xsense.co>> wrote:

            Hi

            I have a table that stores time series data, the data is
            not TTLed since we want to retain the data for the
            foreseeable future, and there are no updates or deletes.
            (deletes could happens rarely in case some scrambled data
            reached the table, but its extremely rare).

            Usually we do constant write of incoming data to the table
            ~ 5 milion a day, mostly newly generated data in the past
            week, but we also get old data that got stuck somewhere
            but not that often. Usually our reads are for the most
            recent data last month – three. But we do fetch old data
            as well in a specific time period in the past.

            Lately we have been facing performance trouble with this
            table see histogram below, When compaction is working on
            the table the performance even drops to 10-20 seconds!!

            Percentile SSTables     Write Latency Read Latency   
            Partition Size        Cell Count

            (micros) (micros)           (bytes)

            50% 215.00             17.08 89970.66 1916               149

            75% 446.00             24.60 223875.79 2759               215

            95% 535.00             35.43 464228.84 8239               642

            98% 642.00             51.01 668489.53 24601              1916

            99% 642.00             73.46 962624.93 42510              3311

            Min 0.00              2.30 10090.81 43                 0

            Max 770.00           1358.10 2395318.86 5839588           
            454826

            As u can see we are scaning hundreds of sstables, turns
            out we are using DTCS  (min:4,max32) , the table folder
            contains ~33K files  of ~130GB per node (cleanup pending
            after increasing the cluster), And compaction takes a very
            long time to complete.

            As I understood DTCS is deprecated so my questions

             1. should we switch to TWCS even though our data is not
                TTLed since we do not do delete at all can we still
                use it? Will it improve performance?

        It will probably be better than DTCS here, but you'll still
        have potentially lots of sstables over time.

        Lots of sstables in itself isn't a big deal, the problem comes
        from scanning more than a handful on each read. Does your
        table have some form of date bucketing to avoid touching old
        data files?

             1. If we should switch I am thinking of using a time
                window of a week, this way the read will scan 10s of
                sstables instead of hundreds today. Does it sound
                reasonable?

        10s is better than hundreds, but it's still a lot.

             1. Is there a recommended size of a window bucket in
                terms of disk space?

        When I wrote it, I wrote it for a use case that had 30 windows
        over the whole set of data. Since then, I've seen it used with
        anywhere from 5 to 60 buckets.

        With no TTL, you're effectively doing infinite buckets. So the
        only way to ensure you're not touching too many sstables is to
        put the date (in some form) into the partition key and let the
        database use that (+bloom filters) to avoid reading too many
        sstables.

             1. If TWCS is not a good idea should I switch to STCS
                instead could that yield in better performance than
                current situation?

        LCS will give you better read performance. STCS will probably
        be better than DTCS given the 215 sstable p50 you're seeing
        (which is crazy btw, I'm surprised you're not just OOMing)

             1. What are the risk of changing compaction strategy on a
                production system, can it be done on the fly? Or its
                better to go through a full test, backup cycle?

        The risk is you trigger a ton of compactions which drops the
        performance of the whole system all at once and your front
        door queries all time out.

        You can approach this a few ways:

        - Use the JMX endpoint to change compaction on one instance at
        a time (rather than doing it in the schema), which lets you
        control how many nodes are re-writing all their data at any
        given point in time

        - You can make an entirely new table, and then populate it by
        reading from the old one and writing ot the new one, and then
        you dont have the massive compaction kick off

        - You can use user defined compaction to force compact some of
        those 33k sstables into fewer sstables in advance, hopefully
        taking away some of the pain you're seeing, before you fire
        off the big compaction

        The 3rd hint above - user defined compaction - will make TWCS
        less effective, because TWCS uses the max timestamp per
        sstable for bucketing, and you'd be merging sstables and
        losing granularity.

        Really though, the main thing you need to do is get a time
        component in your partition key so you avoid scanning every
        sstable looking for data, either that or bite the bullet and
        use LCS so the compaction system keeps it at a manageable
        level for reads.

            1.

            All input will be appreciated,

            Thank you

Reply via email to