I think that if SSTs are partitioned within the node using RP, so that each
partition is small and can be compacted independently of all other
partitions, you can implement an algorithm that will spread out the work of
compaction over time so that it never takes a node out of commission, as it
does now.

I have left a comment here to that effect here:

https://issues.apache.org/jira/browse/CASSANDRA-1608?focusedCommentId=12980654&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12980654

On Mon, Jan 10, 2011 at 10:56 PM, Jonathan Ellis <jbel...@gmail.com> wrote:

> I'd suggest describing your approach on
> https://issues.apache.org/jira/browse/CASSANDRA-1608, and if it's
> attractive, porting it to 0.8.  It's too late for us to make deep
> changes in 0.6 and probably even 0.7 for the sake of stability.
>
> On Mon, Jan 10, 2011 at 8:00 AM, shimi <shim...@gmail.com> wrote:
> > I modified the code to limit the size of the SSTables.
> > I will be glad if someone can take a look at it
> > https://github.com/Shimi/cassandra/tree/cassandra-0.6
> > Shimi
> >
> > On Fri, Jan 7, 2011 at 2:04 AM, Jonathan Shook <jsh...@gmail.com> wrote:
> >>
> >> I believe the following condition within submitMinorIfNeeded(...)
> >> determines whether to continue, so it's not a hard loop.
> >>
> >> // if (sstables.size() >= minThreshold) ...
> >>
> >>
> >>
> >> On Thu, Jan 6, 2011 at 2:51 AM, shimi <shim...@gmail.com> wrote:
> >> > According to the code it make sense.
> >> > submitMinorIfNeeded() calls doCompaction() which
> >> > calls submitMinorIfNeeded().
> >> > With minimumCompactionThreshold = 1 submitMinorIfNeeded() will always
> >> > run
> >> > compaction.
> >> >
> >> > Shimi
> >> > On Thu, Jan 6, 2011 at 10:26 AM, shimi <shim...@gmail.com> wrote:
> >> >>
> >> >>
> >> >> On Wed, Jan 5, 2011 at 11:31 PM, Jonathan Ellis <jbel...@gmail.com>
> >> >> wrote:
> >> >>>
> >> >>> Pretty sure there's logic in there that says "don't bother
> compacting
> >> >>> a single sstable."
> >> >>
> >> >> No. You can do it.
> >> >> Based on the log I have a feeling that it triggers an infinite
> >> >> compaction
> >> >> loop.
> >> >>
> >> >>>
> >> >>> On Wed, Jan 5, 2011 at 2:26 PM, shimi <shim...@gmail.com> wrote:
> >> >>> > How does minor compaction is triggered? Is it triggered Only when
> a
> >> >>> > new
> >> >>> > SStable is added?
> >> >>> >
> >> >>> > I was wondering if triggering a compaction
> >> >>> > with minimumCompactionThreshold
> >> >>> > set to 1 would be useful. If this can happen I assume it will do
> >> >>> > compaction
> >> >>> > on files with similar size and remove deleted rows on the rest.
> >> >>> > Shimi
> >> >>> > On Tue, Jan 4, 2011 at 9:56 PM, Peter Schuller
> >> >>> > <peter.schul...@infidyne.com>
> >> >>> > wrote:
> >> >>> >>
> >> >>> >> > I don't have a problem with disk space. I have a problem with
> the
> >> >>> >> > data
> >> >>> >> > size.
> >> >>> >>
> >> >>> >> [snip]
> >> >>> >>
> >> >>> >> > Bottom line is that I want to reduce the number of requests
> that
> >> >>> >> > goes to
> >> >>> >> > disk. Since there is enough data that is no longer valid I can
> do
> >> >>> >> > it
> >> >>> >> > by
> >> >>> >> > reclaiming the space. The only way to do it is by running Major
> >> >>> >> > compaction.
> >> >>> >> > I can wait and let Cassandra do it for me but then the data
> size
> >> >>> >> > will
> >> >>> >> > get
> >> >>> >> > even bigger and the response time will be worst. I can do it
> >> >>> >> > manually
> >> >>> >> > but I
> >> >>> >> > prefer it to happen in the background with less impact on the
> >> >>> >> > system
> >> >>> >>
> >> >>> >> Ok - that makes perfect sense then. Sorry for misunderstanding :)
> >> >>> >>
> >> >>> >> So essentially, for workloads that are teetering on the edge of
> >> >>> >> cache
> >> >>> >> warmness and is subject to significant overwrites or removals, it
> >> >>> >> may
> >> >>> >> be beneficial to perform much more aggressive background
> compaction
> >> >>> >> even though it might waste lots of CPU, to keep the in-memory
> >> >>> >> working
> >> >>> >> set down.
> >> >>> >>
> >> >>> >> There was talk (I think in the compaction redesign ticket) about
> >> >>> >> potentially improving the use of bloom filters such that obsolete
> >> >>> >> data
> >> >>> >> in sstables could be eliminated from the read set without
> >> >>> >> necessitating actual compaction; that might help address cases
> like
> >> >>> >> these too.
> >> >>> >>
> >> >>> >> I don't think there's a pre-existing silver bullet in a current
> >> >>> >> release; you probably have to live with the need for
> >> >>> >> greater-than-theoretically-optimal memory requirements to keep
> the
> >> >>> >> working set in memory.
> >> >>> >>
> >> >>> >> --
> >> >>> >> / Peter Schuller
> >> >>> >
> >> >>> >
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> Jonathan Ellis
> >> >>> Project Chair, Apache Cassandra
> >> >>> co-founder of Riptano, the source for professional Cassandra support
> >> >>> http://riptano.com
> >> >>
> >> >
> >> >
> >
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

Reply via email to