Risto, thank you for your pre-analysis about multi-lines with regexp, and
also for suggestions about multi-files yet more sophisticated solution.

My comments are also inline:

st 27. 11. 2019 o 15:07 Risto Vaarandi <risto.vaara...@gmail.com>
napísal(a):

> hi Richard,
>
...

> In the current code base, identifying the end of each line is done with a
> simple search for newline character. The newline is searched not with a
> regular expression, but rather with index() function which is much faster.
> It is of course possible to change the code, so that a regular expression
> pattern is utilized instead, but that would introduce a noticeable
> performance penalty. For example, I made couple of quick tests with
> replacing the index() function with a regular expression that identifies
> the newline separator, and when testing modified sec code against log files
> of 4-5 million events, cpu time consumption increased by 25%.
>

Hmm, this is interesting. The philosophically principial question came to
my mind, if this penalty could be decreased (optimized), when doing
replacements of these regular newline characters ("\n") and matching
endings of "lines" with regexp - through rules (or by other more external
way) - before further processing by subsequent rules, instead of potential
built-in feature (used optionally on particular logfiles).

Introducing a custom delimiter also raises another important question --
> should it also be used for actions that utilize newline as a natural
> separator? For example, 'add' and 'event' actions split data into events by
> newline (if present in the data), 'spawn' action assumes that events from a
> child process are separated by newlines, etc. Should all these actions be
> changed to use the new separator?
>

Just thinking: if I understand correctly, about delimiter in event stores,
maybe there could be internally used "special character" other than regular
newline, which is most improbable to occur in logfiles (at least plain text
logfiles - I don't know, if SEC is being used sometimes also on binary
files, where any combination of bits can occur). I have not analyzed
current code base, but maybe at least here could be some performance
savings, when not differentiating "lines" in event stores with regular
expression matching, as when reading logfiles, but just with single
character. Maybe one of these?:

   - NUL (ASCII 0, null character)
   - ETX (ASCII 3, end of text)
   - RS (ASCII 30, record separator)

Given the performance penalty and other delimiter related questions, this
> idea needs careful thinking before implementing it in the code. (Before
> moving forward, it would be also interesting to know how many users would
> see this idea worth implementing.)
>

Let's see, if also somebody else will be interested in this topic.

>
>
>> - accepting wildcard pattern as specification of input log file, to
>> "monitor them all" (also dynamically adding newly created files matching
>> wildcard and removing disappeared)
>>
>
> It would be easier to implement that functionality, since input file
> patterns are re-evaluated on some signals, and in principle it is possible
> to invoke similar code after short time periods (e.g., once in 5 seconds).
> However, sec-2.8.X has 'addinput' and 'dropinput' actions which offer more
> general interface for dynamically adding and dropping inputs. For example,
> it is possible to start an external script with 'spawn' action which can
> detect input files not just with wildcard match but also more advanced
> criteria, and generate synthetic events like NEWFILE_<file> for input files
> that need opening. These synthetic events can be easily captured by a rule
> which invokes 'addinput' action for relevant files. I acknowledge that this
> functionality is somewhat different from providing wildcards in command
> line and requires writing your own script, but you can actually do more
> advanced things here.
>

Yes, this sounds yet more advanced.


> kind regards,
> risto
>

Thank you

Richard
_______________________________________________
Simple-evcorr-users mailing list
Simple-evcorr-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/simple-evcorr-users

Reply via email to