[jira] [Updated] (CASSANDRA-9146) Ever Growing sstables after every Repair

Anuj Wadehra (JIRA) Sat, 06 Jun 2015 12:42:40 -0700

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-9146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Anuj Wadehra updated CASSANDRA-9146:
------------------------------------
    Description: 
Cluster has reached a state where every "repair -pr" operation on CF results in 
numerous tiny sstables being flushed to disk.  Due to thousands of sstables, 
reads have started timing out. Even though compaction begins for one of the 
secondary index, sstable count after repair remains very high (thousands). 
Every repair adds thousands of sstables.

Problems:
1. Why burst of tiny tables are flushed during repair ? What is triggering 
frequent/premature flush of  sstable (more than hundred in every burst)? At max 
we see one ParNew GC pauses >200ms.

2. Why auto-compaction is not compacting all sstables. Is it related to 
coldness issue(CASSANDRA-8885) where compaction doesn't works even when 
cold_reads_to_omit=0 by default? 
   If coldness is the issue, we are stuck in infinite loop: reads will trigger 
compaction but reads timeout as sstable count is in thousands
3. What's the way out if we face this issue in Prod?

Is this issue fixed in latest production release 2.0.13? Issue looks similar to 
CASSANDRA-8641, but the issue is fixed in only 2.1.3. I think it should be 
fixed in 2.0 branch too. 

Configuration:
Compaction Strategy: STCS
memtable_flush_writers=4
memtable_flush_queue_size=4
in_memory_compaction_limit_in_mb=32
concurrent_compactors=12

  was:
Cluster has reached a state where every "repair -pr" operation on CF results in 
numerous tiny sstables being flushed to disk. Most sstables are related to 
secondary indexes. Due to thousands of sstables, reads have started timing out. 
Even though compaction begins for one of the secondary index, sstable count 
after repair remains very high (thousands). Every repair adds thousands of 
sstables.

Problems:
1. Why burst of tiny secondary index tables are flushed during repair ? What is 
triggering frequent/premature flush of secondary index sstable (more than 
hundred in every burst)? At max we see one ParNew GC pauses >200ms.

2. Why auto-compaction is not compacting all sstables. Is it related to 
coldness issue(CASSANDRA-8885) where compaction doesn't works even when 
cold_reads_to_omit=0 by default? 
   If coldness is the issue, we are stuck in infinite loop: reads will trigger 
compaction but reads timeout as sstable count is in thousands
3. What's the way out if we face this issue in Prod?

Is this issue fixed in latest production release 2.0.13? Issue looks similar to 
CASSANDRA-8641, but the issue is fixed in only 2.1.3. I think it should be 
fixed in 2.0 branch too. 

Configuration:
Compaction Strategy: STCS
memtable_flush_writers=4
memtable_flush_queue_size=4
in_memory_compaction_limit_in_mb=32
concurrent_compactors=12

        Summary: Ever Growing sstables after every Repair  (was: Ever Growing 
Secondary Index sstables after every Repair)

> Ever Growing sstables after every Repair
> ----------------------------------------
>
>                 Key: CASSANDRA-9146
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9146
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Anuj Wadehra
>         Attachments: sstables.txt, system-modified.log
>
>
> Cluster has reached a state where every "repair -pr" operation on CF results 
> in numerous tiny sstables being flushed to disk.  Due to thousands of 
> sstables, reads have started timing out. Even though compaction begins for 
> one of the secondary index, sstable count after repair remains very high 
> (thousands). Every repair adds thousands of sstables.
> Problems:
> 1. Why burst of tiny tables are flushed during repair ? What is triggering 
> frequent/premature flush of  sstable (more than hundred in every burst)? At 
> max we see one ParNew GC pauses >200ms.
> 2. Why auto-compaction is not compacting all sstables. Is it related to 
> coldness issue(CASSANDRA-8885) where compaction doesn't works even when 
> cold_reads_to_omit=0 by default? 
>    If coldness is the issue, we are stuck in infinite loop: reads will 
> trigger compaction but reads timeout as sstable count is in thousands
> 3. What's the way out if we face this issue in Prod?
> Is this issue fixed in latest production release 2.0.13? Issue looks similar 
> to CASSANDRA-8641, but the issue is fixed in only 2.1.3. I think it should be 
> fixed in 2.0 branch too. 
> Configuration:
> Compaction Strategy: STCS
> memtable_flush_writers=4
> memtable_flush_queue_size=4
> in_memory_compaction_limit_in_mb=32
> concurrent_compactors=12



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-9146) Ever Growing sstables after every Repair

Reply via email to