On Thu, 5 Feb 2009 13:49:58 +0000, Matthew Toseland wrote: > On Thursday 05 February 2009 00:43, Dennis Nezic wrote: > > On Mon, 2 Feb 2009 17:26:40 -0500, Dennis Nezic wrote: > > > On Tue, 27 Jan 2009 20:13:59 +0000, Matthew Toseland wrote: > > > > On Tuesday 27 January 2009 20:03, Dennis Nezic wrote: > > > > > On Tue, 27 Jan 2009 12:44:59 -0500, Dennis Nezic wrote: > > > > > > On Wed, 21 Jan 2009 17:28:47 +0000, Matthew Toseland wrote: > > > > > > > Give it more memory. If you can't give it more memory, > > > > > > > throw the box out the window and buy a new one. If you > > > > > > > can't do that wait for the db4o branch. > > > > > > > > > > > > Or, more likely, throw freenet out the window :|. > > > > > > > > > > > > > Seriously, EVERY time I have investigated these sorts of > > > > > > > issues the answer has been either that it is showing > > > > > > > constant Full GC's because it has slightly too little > > > > > > > memory, or that there is external CPU load. Are you > > > > > > > absolutely completely totally > > > > > > > 100000000000000000000000000% sure that that is not the > > > > > > > problem? AFAICS there are two posters here, and just > > > > > > > because one of them is sure that the problem isn't memory > > > > > > > doesn't necessarily mean that the other one's problems > > > > > > > are not due to memory?? > > > > > > > > > > > > My node crashed/restarted again due to > > > > > > MessageCore/PacketSender freezing for 3 minutes. The > > > > > > problem appears to be with cpu usage, since my memory usage > > > > > > is basically plateauing when the crash occurs, though I > > > > > > suppose the two factors may not be necessarily entirely > > > > > > unrelated. My cpu load (ie. as reported by uptime) would > > > > > > sometimes rise pretty dramatically, with a 15-min load > > > > > > number hovering between 3 and 4, which brings my system to > > > > > > a crawl, and I guess this eventually "freezes" some threads > > > > > > in freenet, and then triggers the shutdown. > > > > > > > > > > Restarting the node "fixes" the cpu-load problem, even though > > > > > the node is doing exactly the same stuff as before, at least > > > > > from the user's perspective. So, clearly, the problem is not > > > > > just "slow and obsolete" hardware as you suggest, but > > > > > something else internal to the code, that grows out of > > > > > control over time--over the course of dozens of hours. > > > > > > > > I.e. memory usage. QED! > > > > > > > > Memory usage was plateauing = memory usage was constantly at the > > > > (low) maximum, and it was using 100% CPU in a vain attempt to > > > > reclaim the last byte of memory. This is the most likely > > > > explanation by far: Can you categorically rule it out via > > > > checking freenet.loggc? You did add the wrapper.conf line I > > > > mentioned?: > > > > > > Hrm. Upon closer inspection of my latest loggc, > > > http://dennisn.dyndns.org/guest/pubstuff/loggc-freezes.log.bz2 > > > > > > It appears that memory may in fact be an issue. But I don't think > > > it's the memory limit itself. This last test I set my java memory > > > limit to 250MB, and the logs show it never went much above 200MB. > > > BUT, looking at the last few Full GC's, the time it took for them > > > to complete increased rapidly near the end, and the last Full GC > > > took over 3min!, which probably triggered the "freeze". > > > > > > My system only has 384MB of physical ram and 400MB of swap in a > > > swapfile (all of which is on Raid5/LVM :b). My current theory is > > > that maybe the terribly long Full GCs are due to long disk-io > > > times resulting from accessing the Raid5/LVM/swapfile. "man java" > > > shows an interesting option "-Xincgc", which seems to avoid Full > > > GC's: > > > > > > " > > > Enable the incremental garbage collector. The incremental > > > garbage collector, which is off by default, will reduce the > > > occasional long garbage-collection pauses during program exe- > > > cution. The incremental garbage collector will at times exe- > > > cute concurrently with the program and during such times will > > > reduce the processor capacity available to the program. > > > " > > > > > > I'll see if that has any effect. (Is there any way to make the jvm > > > more forgiving, to allow it to handle longer-than-3min garbage > > > collections?) > > > > > > Here is my vmstat, in 60s samples, without freenet running. So, > > > clearly, with < 10M of physical memory free, the swapfile will be > > > used heavily :o. > > > > > > # vmstat -S M 60 > > > ---------memory---------- ---swap-- -----io---- --system-- > > > ----cpu---- swpd free buff cache si so bi bo > > > in cs us sy id wa 101 9 37 179 0 0 33 > > > 155 471 192 19 2 74 5 101 5 37 179 0 0 > > > 79 21 442 179 20 4 66 11 101 8 37 179 0 > > > 0 71 5 441 114 68 3 25 3 101 8 37 180 > > > 0 0 59 32 465 168 19 2 73 5 > > > > > > Here is the same vmstat with freenet running: > > > > > > 196 4 5 45 0 0 267 137 518 540 39 3 > > > 38 20 196 4 6 49 0 0 80 184 486 371 > > > 36 2 54 8 196 7 6 46 0 0 18 39 486 > > > 303 30 1 63 6 196 11 7 41 0 0 88 109 > > > 472 341 31 2 62 4 > > > > > > More swap space is used, and more disk-io (bi and bo--blocks > > > written/read from disk is almost doubled, cpu.us--time spent > > > running non-kernel code has more than doubled, and cpu.wa--time > > > spent waiting for io--the last column, is somewhat increas(ing)). > > > > > > My fingers are crossed with this -Xincgc option. > > > > It didn't appear to have much effect. It still did a few Full GC's, > > 3 in 1.8 days, so far more rarely, but no significant improvement > > in load or memory management. > > > > http://dennisn.dyndns.org/guest/pubstuff/loggc-freezes-xinc.log.bz2 > > > > As before, the GC's become increasingly longer and more eratic near > > the end. This time, the last Full GC took 99s, with a bunch of long > > 11s-40s GCs around the same time, which almost certainly > > contributed the most to the dreaded "3 minute freeze". As usual, > > the hour before the freeze, CPU load is higher than normal (as are > > GC timings), then finally spikes even higher, then the freeze and > > node shutdown. > > > > I'll try lowering the memory I allocate to freenet, and try to free > > some more of my precious RAM on my system. (My datastore is > > currently only 5G, so the bloom shouldn't be a problem.) I'll also > > try to monitor my swapfile activity to doubly-confirm that is the > > issue here :|. > > > > (Is there no way to get rid of constant stupid java GC? Perhaps we > > should move to C? :P) > > If GC is taking more than a fraction of a second then there is > something seriously wrong on your system, most likely part of the VM > has been swapped out.
Reducing the memory I allocate to freenet (to 128MB) and freeing up some more RAM from my other running processes, all to avoid using swapspace seems to have worked :P. It's running pretty smoothly now! Case closed.