I have similar problem with cassandra 0.8.10. After digging a while
I've found that my problem is somehow related to system page scanning
activities.
After turning on -Xloggc I've found high system cpu usage during
ParNew. After looking into sar -B I've found this:

08:00:08 AM   pgpgin/s pgpgout/s   fault/s  majflt/s  pgfree/s
pgscank/s pgscand/s pgsteal/s    %vmeff
08:00:08 AM   3916.08    962.47   1737.93     31.00   1556.87
0.00   6514.35    789.03     12.11
08:10:06 AM   4753.89   1654.53    502.34     37.57   1296.41
0.00   7074.27    942.17     13.32
08:20:09 AM   4961.67   1062.75    508.87     40.01   1194.93
0.00   7474.82    915.63     12.25
08:30:03 AM   5006.94   1397.18    500.92     39.95   1394.10
0.00   9155.64   1124.52     12.28
08:40:09 AM   5796.97   3312.25    507.73     40.44   1462.88
0.00  59539.87   1173.68      1.97
08:50:08 AM   2354.89    602.84    281.43     19.32    636.23
0.00 2126568.76    331.31      0.02
09:00:08 AM      4.83     73.72     90.16      0.06     61.83
0.00 3626444.63     10.90      0.00
09:10:08 AM      0.87      2.82     81.70      0.01     45.39
0.00 3563230.69      0.03      0.00
09:20:09 AM      0.01      4.12     82.28      0.00     46.45
0.00 3566248.02      0.02      0.00
09:30:07 AM      8.62      3.84    122.97      0.11     65.31
0.00 3569377.35      0.59      0.00
09:40:08 AM      0.49    154.61     84.49      0.01     53.06
0.00 3580618.13      4.20      0.00
09:50:08 AM      0.07      5.25     82.58      0.00     47.76
0.00 3564969.22      0.04      0.00
10:00:08 AM     38.25    205.50    200.00      0.02    161.27
0.00 3580132.24      0.25      0.00
10:10:08 AM      0.59      3.43     97.67      0.02     52.99
0.00 3562920.83      0.04      0.00
10:20:08 AM      1.61      5.74     83.32      0.03     48.05
0.00 3587643.92      0.07      0.00
10:30:08 AM      0.47      3.67     77.19      0.00     44.04
0.00 3568829.92      0.11      0.00
10:40:08 AM      0.26     34.01     77.61      0.01     65.30
0.00 3564740.06     20.00      0.00
10:50:06 AM      0.25      4.76     82.60      0.00     49.48
0.00 3562681.08      2.56      0.00
11:00:08 AM      0.90      3.18     85.22      0.02     46.19
0.00 3559971.42      0.26      0.00
11:10:08 AM      0.05      3.20     81.81      0.00     45.50
0.00 3571435.22      0.03      0.00
11:20:07 AM      0.06      5.09     82.39      0.00     46.57
0.00 3571160.15      0.05      0.00
11:30:08 AM      0.42      3.38     76.75      0.00     43.54
0.00 3550153.01      0.07      0.00
11:40:08 AM      2.25     34.13     80.73      0.02     48.81
0.00 3556773.99      0.16      0.00
11:50:08 AM    143.70    145.63    181.79      0.00    146.17
0.00 3559806.31      0.04      0.00
12:00:08 PM      8.93      4.34   1428.64      0.14    551.37
0.00 3571540.52      0.82      0.00
12:10:08 PM  29038.30  32157.76   2301.36    163.57  17789.40
12155.79 1555422.56  14316.18      0.91
12:20:08 PM  40005.26  33453.59   1175.56    119.07  19402.11
14067.52  27828.66  18016.59     43.00
12:30:08 PM  28625.65  19189.01   1197.00    124.72  12740.11
7576.56  11155.77  11423.98     60.99
12:40:08 PM  14569.77   8156.18   1197.94    115.99  11249.82
1115.44   1493.11   1704.10     65.33
12:50:09 PM  12663.07  12462.00   1159.15     91.80   2798.45
0.00   3701.85   1508.08     40.74
01:00:05 PM  12587.33  10603.46   1117.98    100.40   6279.41
3533.36   3904.89   5156.94     69.33

It's looks like system is busy for 3  hours looking for pages to
free... There was no heavy compaction, unexpected high load - nothing
special but strange kernel behaviour, long ParNew and message drops.
Does your system behave that way?

Regards,
Wojtek

On Tue, Jul 17, 2012 at 10:14 AM, Nikolay Kоvshov <nkovs...@yandex.ru> wrote:
>
> This is a cluster of 2 nodes, each having 8G of operating memory, 
> replicationfactor=2
> Write/read pressure is quite low and almost never exceeds 10/second
>
> From time to time (2-3 times in a month) I see GC activity in logs and for 
> this time cassandra stops responding to requests which results in a timeout 
> in upper-layer application. Total time of unavailability can be over 5 minues 
> (like in the following case)
>
> What can I do with that? Wiil it become much worse when my cluster grows up?
>
> INFO [GossipTasks:1] 2012-07-16 13:10:50,055 Gossiper.java (line 736) 
> InetAddress /10.220.50.9 is now dead.
>  INFO [ScheduledTasks:1] 2012-07-16 13:10:50,056 GCInspector.java (line 123) 
> GC for ParNew: 391383 ms for 1 collections, 2025808488 used; max is 8464105472
>  INFO [ScheduledTasks:1] 2012-07-16 13:10:50,067 StatusLogger.java (line 50) 
> Pool Name                    Active   Pending   Blocked
>  INFO [ScheduledTasks:1] 2012-07-16 13:10:50,071 StatusLogger.java (line 65) 
> ReadStage                         0         0         0
>  INFO [ScheduledTasks:1] 2012-07-16 13:10:50,071 StatusLogger.java (line 65) 
> RequestResponseStage              0         0         0
>  INFO [ScheduledTasks:1] 2012-07-16 13:10:50,072 StatusLogger.java (line 65) 
> ReadRepairStage                   0         0         0
>  INFO [ScheduledTasks:1] 2012-07-16 13:10:50,072 StatusLogger.java (line 65) 
> MutationStage                     0         0         0
>  INFO [ScheduledTasks:1] 2012-07-16 13:10:50,073 StatusLogger.java (line 65) 
> ReplicateOnWriteStage             0         0         0
>  INFO [GossipStage:1] 2012-07-16 13:10:50,074 Gossiper.java (line 722) 
> InetAddress /10.220.50.9 is now UP
>  INFO [ScheduledTasks:1] 2012-07-16 13:10:50,075 StatusLogger.java (line 65) 
> GossipStage                       1        59         0
>  INFO [ScheduledTasks:1] 2012-07-16 13:10:50,075 StatusLogger.java (line 65) 
> AntiEntropyStage                  0         0         0
>  INFO [ScheduledTasks:1] 2012-07-16 13:10:50,076 StatusLogger.java (line 65) 
> MigrationStage                    0         0         0
>  INFO [ScheduledTasks:1] 2012-07-16 13:10:50,076 StatusLogger.java (line 65) 
> StreamStage                       0         0         0
>  INFO [ScheduledTasks:1] 2012-07-16 13:10:50,077 StatusLogger.java (line 65) 
> MemtablePostFlusher               0         0         0
>  INFO [ScheduledTasks:1] 2012-07-16 13:10:50,077 StatusLogger.java (line 65) 
> FlushWriter                       0         0         0
>  INFO [ScheduledTasks:1] 2012-07-16 13:10:50,077 StatusLogger.java (line 65) 
> MiscStage                         0         0         0
>  INFO [ScheduledTasks:1] 2012-07-16 13:10:50,078 StatusLogger.java (line 65) 
> InternalResponseStage             0         0         0
>  INFO [ScheduledTasks:1] 2012-07-16 13:10:50,078 StatusLogger.java (line 65) 
> HintedHandoff                     0         0         0
>  INFO [ScheduledTasks:1] 2012-07-16 13:10:50,079 StatusLogger.java (line 69) 
> CompactionManager               n/a         0
>  INFO [ScheduledTasks:1] 2012-07-16 13:10:50,079 StatusLogger.java (line 81) 
> MessagingService                n/a       0,0
>  INFO [ScheduledTasks:1] 2012-07-16 13:10:50,080 StatusLogger.java (line 85) 
> ColumnFamily                Memtable ops,data  Row cache size/cap  Key cache 
> size/cap
>  INFO [ScheduledTasks:1] 2012-07-16 13:10:50,080 StatusLogger.java (line 88) 
> Keyspace1.PSS                 36712,343842617                 0/0       
> 97995/1000000
>  INFO [ScheduledTasks:1] 2012-07-16 13:10:50,085 StatusLogger.java (line 88) 
> Keyspace1.Standard1          128679,162567721                 0/0           
> 0/1000000
>  INFO [ScheduledTasks:1] 2012-07-16 13:10:50,085 StatusLogger.java (line 88) 
> system.NodeIdInfo                         0,0                 0/0             
>     0/1
>  INFO [ScheduledTasks:1] 2012-07-16 13:10:50,086 StatusLogger.java (line 88) 
> system.IndexInfo                          0,0                 0/0             
>     0/1
>  INFO [ScheduledTasks:1] 2012-07-16 13:10:50,086 StatusLogger.java (line 88) 
> system.LocationInfo                       0,0                 0/0             
>     1/2
>  INFO [ScheduledTasks:1] 2012-07-16 13:10:50,086 StatusLogger.java (line 88) 
> system.Migrations                         0,0                 0/0             
>     3/3
>  INFO [ScheduledTasks:1] 2012-07-16 13:10:50,087 StatusLogger.java (line 88) 
> system.HintsColumnFamily                1,528                 0/0             
>     0/1
>  INFO [ScheduledTasks:1] 2012-07-16 13:10:50,087 StatusLogger.java (line 88) 
> system.Schema                             0,0                 0/0             
>     2/3
> -------- Завершение пересылаемого сообщения --------

Reply via email to