[jira] [Issue Comment Deleted] (CASSANDRA-10496) Make DTCS/TWCS split partitions based on time during compaction

mck (JIRA) Fri, 08 Sep 2017 18:24:52 -0700

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-10496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


mck updated CASSANDRA-10496:
----------------------------
    Comment: was deleted

(was: Github user michaelsembwever commented on the issue:

    https://github.com/apache/cassandra/pull/147
  
    @iksaif,
     continuing the conversation from 
[CASSANDRA-10496](https://issues.apache.org/jira/browse/CASSANDRA-10496).
    
    > What do you mean by "changing locations isn't supported" ?
    
    `switchCompactionLocation(..)` is a public method and can be called from 
other places, when rows need to now be written to a new sstable in a new 
location. That's why 
[here](https://github.com/thelastpickle/cassandra/commit/a34a72391bb2847b3bd6ed93b4306199ddf3a991#diff-19359d40a9c932efdebc62a067ed4390R40)
 i paired writers by their location, so if a location changes the writer can 
also change.
    
    > Currently it will create up to "minThreshold" sstables
    
    I'm not too sure I get that. The code you referenced is about which bucket 
is up for compaction, not how many writers to use during a bucket's compaction.
    Marcus' idea was to only create two sstables per bucket, one that contains 
all the rows that belong in the bucket, and another for old data that's been 
streamed in late. Therefore SplittingTimeWindowCompactionWriter. 
writersByBounds` should be a fixed array of size 2. The splitting approach as 
describe in  `SplittingTimeWindowCompactionWriter`'s class apidoc: splitting in 
half, then half, down to 50Mb; is quite different to the original idea.
    
    > getBuckets() currently use maxTimestamp, , which isn't available 
(currently) in the compaction task
    
    @krummas ?
    
    > Are you talking about sstables generated before this patch ?
    
    Yes.
)

> Make DTCS/TWCS split partitions based on time during compaction
> ---------------------------------------------------------------
>
>                 Key: CASSANDRA-10496
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10496
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Marcus Eriksson
>              Labels: dtcs
>             Fix For: 4.x
>
>
> To avoid getting old data in new time windows with DTCS (or related, like 
> [TWCS|CASSANDRA-9666]), we need to split out old data into its own sstable 
> during compaction.
> My initial idea is to just create two sstables, when we create the compaction 
> task we state the start and end times for the window, and any data older than 
> the window will be put in its own sstable.
> By creating a single sstable with old data, we will incrementally get the 
> windows correct - say we have an sstable with these timestamps:
> {{[100, 99, 98, 97, 75, 50, 10]}}
> and we are compacting in window {{[100, 80]}} - we would create two sstables:
> {{[100, 99, 98, 97]}}, {{[75, 50, 10]}}, and the first window is now 
> 'correct'. The next compaction would compact in window {{[80, 60]}} and 
> create sstables {{[75]}}, {{[50, 10]}} etc.
> We will probably also want to base the windows on the newest data in the 
> sstables so that we actually have older data than the window.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Issue Comment Deleted] (CASSANDRA-10496) Make DTCS/TWCS split partitions based on time during compaction

Reply via email to