Re: [Zeek-Dev] Log archival (Re: Zeek Supervisor: designing client and log archival) behavior

2020-07-02 Thread Robin Sommer


On Wed, Jul 01, 2020 at 14:03 -0700, Jon Siwek wrote:

> What if an open() rarely or never happens again for a given log?

Ah, right, forgot about that case. So yeah, agree, the shadow files
are useful for this and to retain whatever information we need.

> * Changed: running through a function of same-name, but it happened to
> get changed between restart is probably still going to be closer to
> what user expects than running it through the default post-processor
> which is completely different ?

I was thinking not the default post-processor, but whatever is
configured for the log file we are just opening (if we did it at
open() time). But yeah, won't work when the cleanup happens already
before the new open.

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
Zeek-Dev mailing list
Zeek-Dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] Log archival (Re: Zeek Supervisor: designing client and log archival) behavior

2020-07-01 Thread Jon Siwek
On Wed, Jul 1, 2020 at 1:59 AM Robin Sommer  wrote:
>
> > Log::default_rotation_dir
>
> Seems we should then set this to "." by default, and have the cluster
> framework override it.

Yes, exactly.

> Once moved, I suppose we would continue to optionally run a
> post-processor, right? For a supervised cluster, we wouldn't use that
> and suggest that people go with "zeek-archive" instead; but with
> ZeekControl we'd keep the current behavior of gzipping behavior so
> that we don't break any setups.

Yes, with the proposed changes, custom postprocessors still work the
same as before and everything is backwards compatible / equivalent in
non-supervised-mode.

Supervised-mode is just picking some different default settings from
non-supervised-mode:

* don't use a postprocessing script (archive-log)
* rotate into a `Log::default_rotation_dir` of "log-queue" instead of "."

> Not sure it's worth retaining the information about the post-processor
> function, and it could to potentially lead to trouble if the function
> changed somehow in between (or disppeared). We could instead just run
> the leftovers through whatever the restarted config says to do with
> files.

* Disappeared: easy to notice the function no longer exists and
fallback to default post-processor

* Changed: running through a function of same-name, but it happened to
get changed between restart is probably still going to be closer to
what user expects than running it through the default post-processor
which is completely different ?

> Do we even need any other meta data at all in the new scheme? I'm
> wondering if we could simplify this all to: "If at open() time, X.log
> exists, first rotate it away through the currently configured
> postprocessor function".

What if an open() rarely or never happens again for a given log?

I'm thinking the rotation of leftover logs needs to happen once at
startup rather than lazily.

> Hmm, actually, there's a piece of meta that we'll need: the opening
> timestamp, so that one can incorporate that into the name of the
> rotated file (assuming we want to retain that capability). Unless we
> parsed that out of the X.log itself ...

Don't think we'd have the opening timestamp to parse from the log when
LogAscii::use_json=T.

So still think it's necessary to obtain open-time meta from a
`.shadow.X.log`, either it's explicitly in there or use the files
modified time (essentially creation time).

The close-time of X.log is just taken as last-modified time of X.log.

- Jon
___
Zeek-Dev mailing list
Zeek-Dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev