Thanks Ed! I was thinking about surrendering more memory to mmap operations. I'm going to try bringing the Xmx down to 4G
On Fri, May 27, 2011 at 5:19 PM, Edward Capriolo <edlinuxg...@gmail.com> wrote: > > > On Fri, May 27, 2011 at 9:08 AM, Jonathan Colby <jonathan.co...@gmail.com> > wrote: >> >> Hi - >> >> Operations like repair and bootstrap on nodes in our cluster (average >> load 150GB each) take a very long time. >> >> By long I mean 1-2 days. With nodetool "netstats" I can see the >> progress % very slowly progressing. >> >> I guess there are some throttling mechanisms built into cassandra. >> And yes there is also production load on these nodes so it is somewhat >> understandable. Also some of out compacted data files are as 50-60 GB >> each. >> >> I was just wondering if these times are similar to what other people >> are experiencing or if there is a serious configuration problem with >> our setup. >> >> So what have you guys seen with operations like loadbalance,repair, >> cleanup, bootstrap on nodes with large amounts of data?? >> >> I'm not seeing too many full garbage collections. Other minor GCs are >> well under a second. >> >> Setup info: >> 0.7.4 >> 5 GB heap >> 8 GB ram >> 64 bit linux os >> AMD quad core HP blades >> CMS Garbage collector with default cassandra settings >> 1 TB raid 0 sata disks >> across 2 datacenters, but operations within the same dc take very long >> too. >> >> >> This is a netstat output of a bootstrap that has been going on for 3+ >> hours: >> >> Mode: Normal >> Streaming to: /10.47.108.103 >> >> /var/lib/cassandra/data/DFS/main-f-1541-Data.db/(0,32842490722),(32842490722,139556639427),(139556639427,161075890783) >> progress=94624588642/161075890783 - 58% >> /var/lib/cassandra/data/DFS/main-f-1455-Data.db/(0,660743002) >> progress=0/660743002 - 0% >> >> /var/lib/cassandra/data/DFS/main-f-1444-Data.db/(0,32816130132),(32816130132,71465138397),(71465138397,90968640033) >> progress=0/90968640033 - 0% >> >> /var/lib/cassandra/data/DFS/main-f-1540-Data.db/(0,931632934),(931632934,2621052149),(2621052149,3236107041) >> progress=0/3236107041 - 0% >> >> /var/lib/cassandra/data/DFS/main-f-1488-Data.db/(0,33428780851),(33428780851,110546591227),(110546591227,110851587206) >> progress=0/110851587206 - 0% >> >> /var/lib/cassandra/data/DFS/main-f-1542-Data.db/(0,24091168),(24091168,97485080),(97485080,108233211) >> progress=0/108233211 - 0% >> >> /var/lib/cassandra/data/DFS/main-f-1544-Data.db/(0,3646406),(3646406,18065308),(18065308,25776551) >> progress=0/25776551 - 0% >> /var/lib/cassandra/data/DFS/main-f-1452-Data.db/(0,676616940) >> progress=0/676616940 - 0% >> >> /var/lib/cassandra/data/DFS/main-f-1548-Data.db/(0,6957269),(6957269,48966550),(48966550,51499779) >> progress=0/51499779 - 0% >> >> /var/lib/cassandra/data/DFS/main-f-1552-Data.db/(0,237153399),(237153399,750466875),(750466875,898056853) >> progress=0/898056853 - 0% >> >> /var/lib/cassandra/data/DFS/main-f-1554-Data.db/(0,45155582),(45155582,195640768),(195640768,247592141) >> progress=0/247592141 - 0% >> /var/lib/cassandra/data/DFS/main-f-1449-Data.db/(0,2812483216) >> progress=0/2812483216 - 0% >> >> /var/lib/cassandra/data/DFS/main-f-1545-Data.db/(0,107648943),(107648943,434575065),(434575065,436667186) >> progress=0/436667186 - 0% >> Not receiving any streams. >> Pool Name Active Pending Completed >> Commands n/a 0 134283 >> Responses n/a 0 192438 > > That is a little long but every case is diffent par. With low requiest load > and some heavy server iron RAID,RAM you can see a compaction move really > fast 300 GB in 4-6 hours. With enough load one of these operations > compact,cleanup,join can get really bogged down to the point where it almost > does not move. Sometimes that is just the way it is based on how fragmented > your rows are and how fast your gear is. Not pushing your Cassandra caches > up to your JVM limit can help. If your heap is often near full you can have > jvm memory fragmentation which slows things down. > > 0.8 has some more tuning options for compaction, multi-threaded, knobs for > effective rate. > > I notice you are using: > 5 GB heap > 8 GB ram > > So your RAM/DATA ratio is on the lower site. I think unless you have a good > use case for row cache less XMx is more, but that is a minor tweak. >