Re: [Simple-evcorr-users] Q - Post-hoc, non-realtime logfile processing

Jeroen Scheerder Tue, 31 Mar 2009 05:18:38 -0700

Thanks for the quick replies.  I think I have some relevant  
observations, and am interested in your thoughts.

Conway Allen (31 Mar 2009, 12:12) wrote:

> It seems to me a very good idea indeed! Has it been done already? No  
> idea!
>
> How could you do it in a useful, general way? You'd need a user- 
> specifyable way of extracting the time from the data - the data  
> might not all come from the same source and so might not be  
> uniformly formatted. Also, if you take a look at the source code  
> there are quite a few calls to time()... Don't know what they're all  
> there for. Anyway, someone more knowledgeable than I is sure to get  
> back to you.

I have some idea what the timestamping-related functionality is.  A  
lot of it has to do with polling pipes and open files to see if  
anything has been appended since sec last looked.  All this is  
irrelevant to *my* goal, since doing a non-real time analysis of log  
files basically assumes using the "-notail" option anyway.  In that  
particular mode of operation, the following comment from sec itself  
covers it, I think:

       # if there were no new bytes in the file and -notail mode is  
active,
       # don't set the check timer and skip shuffle and timeout checks  
(i.e.,
       # -input_timeout, -timeout_script, -reopen_timeout, and - 
check_timeout
       # options are ignored when -notail is set)

However, there are a lot of details to pay attention to...

Brown, James (31 Mar 2009, 12:06) wrote:

> This has been kicking around for a while.  Let me see if I can  
> remember...
>
> Since SEC uses time(), you would have to write a perl function that
> replaces that function locally.

Actually, that seems to be relevant in a few locations only: when  
reading a logline and, as I understand things, creating a context (in  
a context list).

So I wouldn't redefine/change time(), I'd try a "timefromlogline()" in  
a few specific places and leave other time()s alone.  This would of  
course mean that only syslog logfiles (ie. files containing lines with  
a timestamp prefix) can be digested, but that's a limitation I can  
live with.

> Once you do this, you have some ability to read pre-existing logs.   
> However
> there are several things to consider:
>
>  - Time lapses.  Let's say that you read a timestamp of 03:15:10   
> and on the
>    next line, the time stamp is 03:15:55.  You've just crossed over  
> 45 seconds
>    of time.  If there are SEC rules that perform window  
> calculations, such as
>    PairWithWindow, you must somehow account for the fact that you've  
> skipped
>    over those seconds- perhaps the time window expired within that  
> 45 seconds,
>    and there may be a resulting action that needs to happen.

That could be problematic, I can't oversee yet.  However, I'd already  
be happy with a solution that works OK for contiguous logfiles.

>  - External interaction.  In the example above, if the action called  
> a separate
>    script, that called the Unix date function (to get a time stamp  
> inside a script
>    for example) that time would be real world clock time- not your  
> time() function
>    sped-up time.
>
>  - Internals.  SEC uses internal timestamps for certain actions.  If  
> your time()
>    function gives SEC the wrong time, it may cause SEC problems.

I think these are mostly related to "tail" mode, and furthermore using  
time from the logfiles only in the necessary places might make me get  
away with it.

> The nicest way to fix this would be to speed up the operation of the  
> entire  system-
> i.e. make time run faster for everything on the SEC host.  I don't  
> know if anyone
> has done that, but it would be interesting.

Are you suggesting actually making OS time match, i.e. writing a SEC  
wrapper:

   #!/my/bin/pseudocode
   # poor man's time machine for logfile replay

   while not end of file
     read a line
     extract the timestamp
     sets the system time to that timestamp
     write the line
   end
   reset the system clock to something sane

and then use it like

   poor-mans-timemachine < /my/log/messages | sec

or even, if there are multiple logfiles to digest simultaneously,  
something like (caution, GNU sort required)

   sort +3n +0M +1n +2n /my/log/messages* | poor-mans-timemachine | sec

I guess this would work, but the penalty seems high.  To set the  
system clock, superuser priviliges are required, and of course,  
messing with the global clock effectively implies only a single  
processing pipeline can be executed.  I don't think I'll be able to  
bring sufficient processing power to bear this way.

> Bottom line- I don't think it's possible.   But others may have a  
> better idea...

Thanks again.  I think it's tricky, but am not yet convinced it's  
impossible and will try to experiment a little.  If I do, I just might  
be able to add an "use timestamp from logfiles" option that entails  
"notail". :-)

Regards, Jeroen.
-- 
Jeroen Scheerder
ON2IT B.V.
Steenweg 17 B
4181 AJ WAARDENBURG
T: +31 418-653818 | F: +31 418-653716
W: www.on2it.nl   | E: [email protected]

------------------------------------------------------------------------------
_______________________________________________
Simple-evcorr-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/simple-evcorr-users

Re: [Simple-evcorr-users] Q - Post-hoc, non-realtime logfile processing

Reply via email to