Charles Hache created LOG4J2-1784:
-------------------------------------

             Summary: Large values for max in DefaultRolloverStrategy blocks 
application
                 Key: LOG4J2-1784
                 URL: https://issues.apache.org/jira/browse/LOG4J2-1784
             Project: Log4j 2
          Issue Type: Bug
          Components: Appenders
    Affects Versions: 2.7
         Environment: Played around a lot in 2.5, once realized issue was 
related to log4j, upgraded to 2.7 and confirmed it is still happening.  Using 
on a raspberry pi.
            Reporter: Charles Hache


Hi all,
I'm trying to debug an application which is using log4j2 for logging.  After 
discovering that by default the RollingRandomAccessFile only kept the last 7 
logs using the counter, I found the DefaultRolloverStrategy option and set the 
max to something like 99999, so in theory I wouldn't have to worry about it.

A couple days later I started looking at everything again (log files from 
log4j, timestamped CSV files from the application, etc) I can see something is 
going terribly wrong.  What I'm finding is that when it comes time to rotate 
the logs, it looks like the whole application hangs.  I see the following:
- All logging to the log file stops;
- All my data going to the CSV files stop;
- CPU usage goes to 100%;
- I cannot stop my application cleanly;

After a while (~4-5 minutes) everything goes back to normal (until the next log 
rotation), but then the first lines of the logs have timestamps like this:
2017-01-14 07:43:00,001 INFO ....
2017-01-14 07:42:11,329 DEBUG ....
2017-01-14 07:47:26,648 TRACE ....
2017-01-14 07:47:26,695 INFO ....
2017-01-14 07:47:26,697 TRACE ....
2017-01-14 07:47:26,737 INFO ....
2017-01-14 07:47:26,739 TRACE ....
2017-01-14 07:47:26,741 TRACE ....
2017-01-14 07:47:26,744 TRACE ....
2017-01-14 07:47:26,768 TRACE ....

Of note is that there is a gap of a few minutes in the first couple lines of 
the file (I'm not too concerned about the out-of-order timestamps as lots of 
different threads are logging).  The application always writes many (dozens, 
maybe hundreds) of lines per second, so a gap of a couple minutes is 
uncharacteristic.  The rest of the log file looks normal (has no gaps, has log 
lines every few milliseconds).  From my specific log lines and where they are 
w.r.t. each other, I looks to me like calls to the logger are blocking (eg: in 
cases I'll have a couple log lines that usually log 1-2 milliseconds apart, but 
when they occur at rollover time they'll be minutes apart).

I shrunk the file size rollover to something small (256K), which I hit in maybe 
15 seconds or so, and it is clearly reproducible.  When I reduced the default 
rollover strategy max file to 40, the problem when away and I can see it 
rolling over the file practically instantly.

The documentation for the DefaultRolloverStrategy says "When the ascending 
attribute is set to true (the default) then the counter will be incremented and 
the current log file will be renamed to include the counter value. If the 
counter hits the maximum value then the oldest file, which will have the 
smallest counter, will be deleted, all other files will be renamed to have 
their counter decremented and then the current file will be renamed to have the 
maximum counter value. Note that with this counting strategy specifying a large 
maximum value may entirely avoid renaming files."

I took this to mean that having a value so large that it would never be hit 
would be more efficient, since it doesn't have to do anything, just keep 
counting up.  I'm wondering now if it's doing something like "check if 
log-1.log exists... check if log-2.log exist ....... check if log-99999.log 
exists" but I don't see any iobound activity in iotop or anything (although I 
can't say for sure if FS calls to see if something exists would show up there, 
for example).

This might not necessarily be a bug in the logger per se, but it would at least 
be a documentation issue if there is some O(n) operation based on the max value 
for the DefaultRolloverStrategy.  

I'd usually dig deeper, but now I still have to debug the original issues with 
the actual application.  Let me know if there is any more info I should provide 
or if there is anything else I can do!
Thanks,
Charles



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-unsubscr...@logging.apache.org
For additional commands, e-mail: log4j-dev-h...@logging.apache.org

Reply via email to