Re: [Zeek-Dev] Log archival (Re: Zeek Supervisor: designing client and log archival) behavior

2020-07-01 Thread Jon Siwek
On Wed, Jul 1, 2020 at 1:59 AM Robin Sommer  wrote:
>
> > Log::default_rotation_dir
>
> Seems we should then set this to "." by default, and have the cluster
> framework override it.

Yes, exactly.

> Once moved, I suppose we would continue to optionally run a
> post-processor, right? For a supervised cluster, we wouldn't use that
> and suggest that people go with "zeek-archive" instead; but with
> ZeekControl we'd keep the current behavior of gzipping behavior so
> that we don't break any setups.

Yes, with the proposed changes, custom postprocessors still work the
same as before and everything is backwards compatible / equivalent in
non-supervised-mode.

Supervised-mode is just picking some different default settings from
non-supervised-mode:

* don't use a postprocessing script (archive-log)
* rotate into a `Log::default_rotation_dir` of "log-queue" instead of "."

> Not sure it's worth retaining the information about the post-processor
> function, and it could to potentially lead to trouble if the function
> changed somehow in between (or disppeared). We could instead just run
> the leftovers through whatever the restarted config says to do with
> files.

* Disappeared: easy to notice the function no longer exists and
fallback to default post-processor

* Changed: running through a function of same-name, but it happened to
get changed between restart is probably still going to be closer to
what user expects than running it through the default post-processor
which is completely different ?

> Do we even need any other meta data at all in the new scheme? I'm
> wondering if we could simplify this all to: "If at open() time, X.log
> exists, first rotate it away through the currently configured
> postprocessor function".

What if an open() rarely or never happens again for a given log?

I'm thinking the rotation of leftover logs needs to happen once at
startup rather than lazily.

> Hmm, actually, there's a piece of meta that we'll need: the opening
> timestamp, so that one can incorporate that into the name of the
> rotated file (assuming we want to retain that capability). Unless we
> parsed that out of the X.log itself ...

Don't think we'd have the opening timestamp to parse from the log when
LogAscii::use_json=T.

So still think it's necessary to obtain open-time meta from a
`.shadow.X.log`, either it's explicitly in there or use the files
modified time (essentially creation time).

The close-time of X.log is just taken as last-modified time of X.log.

- Jon
___
Zeek-Dev mailing list
Zeek-Dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] Zeek Supervisor Command-Line Client

2020-07-01 Thread Robin Sommer
On Tue, Jun 30, 2020 at 14:29 -0700, Jon Siwek wrote:

> Maybe the important observation is that the logic can be performed
> anywhere that has access to the Zeek-Supervisor process.

Agree.

> So where we put the logic at this point may not be important.  If we
> can find a single-best-place for the logic to live, that's great

I believe that's what Seth is arguing for: have a Zeek-side script be
the single point of that logic, rather than implement it multiple
times and/or outside of Zeek.

I can see doing that in Zeek but I think there's a trade-off here: if
we want to do the singe-place approach with a multi-system setup, we'd
need an authoritative place to run this logic and hence depend on
*that* Zeek supervisor being up and running for performing the
operation. That may be a reasonably assumption (say if we dedicated
the supervisor running the manager to also be the cluster
coordinator), but it's different from a world where the client can
execute higher-level operations on its own.

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
Zeek-Dev mailing list
Zeek-Dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


[Zeek-Dev] Supervisor client (Re: Zeek Super-isor: designing client and log archival behavior)

2020-07-01 Thread Robin Sommer


> * https://github.com/zeek/zeek/wiki/Zeek-Supervisor-Client

Some thoughts on the commands:

> $ zeekc status [all | ]

> Do we need to include any other metrics in the returned status?

That information is mostly static, would be nice to get some dynamic
information in there as well, like uptime, CPU/memory/traffic stats,
No need to have that right away, but worth keeping in mind.

> # Do we need more categories to filter by (e.g. node type) ?

I'd skip for now.

> # If there's downed nodes at this point, what do we expect users to do?
> # Check the standard services logs for stderr/stdout info?  Check 
> reporter.log ?

Yeah, would be cool if zeekc had access to the stderr/stdout from the
nodes through their supervisors. The supervisors could buffer that for
a while and return on request. More generally, the supervisor could
get a "diagnostics buffer" that, over time, we could use for more
stuff like store backtraces etc.

"reporter.log" is out I'd say, that will go through the normal log
rotation & archival, and be accessible that way.

> # A `zeekc diag` command could help gather information, like ask Zeek 
> supervisor
> # to find core dumps and extract stack trace.  Would it do more than that, 
> like
> # show last N lines of downed nodes' stderr, or last N lines of reporter.log?

> $ zeekc check

I'm wondering which supervisor that would be be talking to in a
multi-system setup? All?

> $ zeekc terminate
>  ...

> # Normally wouldn't terminate the supervisor if a service-manager is handling
> # the Zeek supervisor process itself and will just restart it, but`terminate`
> # would be helpful for anyone running a supervised Zeek cluster
> "manually".

Another use case: If for some reason one wants to restart the
supervisor itself, "terminate" would kill it and the service
manager would then restart it.

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
Zeek-Dev mailing list
Zeek-Dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


[Zeek-Dev] Log archival (Re: Zeek Supervisor: designing client and log archival) behavior

2020-07-01 Thread Robin Sommer



On Tue, Jun 30, 2020 at 01:39 -0700, Jon Siwek wrote:

> * https://github.com/zeek/zeek/wiki/Zeek-Supervisor-Log-Handling

This overall sounds good to me. Some notes & questions:

> Log Rotation

> To help bridge/replace Step (4) and (5), suggest adding a new option:
> Log::default_rotation_dir. The Log::rotation_format_func() will use
> this as part of its default return value.

Seems we should then set this to "." by default, and have the cluster
framework override it.

> The log_mgr will attempt to create necessary dirs just-in-time,
> failing to do so emits an error, but otherwise continues with rotation
> using working directory instead.

I'd extend this to any error case: if moving from current location to
Log::default_rotation_dir fails (e.g., because the latter is a on
different file system), continue with new name inside the current
working directory (and report the error).

Once moved, I suppose we would continue to optionally run a
post-processor, right? For a supervised cluster, we wouldn't use that
and suggest that people go with "zeek-archive" instead; but with
ZeekControl we'd keep the current behavior of gzipping behavior so
that we don't break any setups.

We can implement that distinction through the post-processer function:
the new default function would just do the rename according to the new
scheme, and a separate legacy function for ZeekControl spawns the
"archive-log" script.

> zeek-archiver

I like making this a standard tool, but seems like something we could
postpone doing right now and prioritize getting the Zeek-side
infrastructure in place.

> We can potentially have the Zeek Supervisor process configurable to
> auto-start and keep a zeek-archiver child alive. 

I'd say that's a job for systemd (or whatever service manager). I know
Seth disagress. :-)

> Leftover Log Rotation

> The rotation for such a leftover log file uses the metadata in the
> shadowfile to help try to go through the exact rotation that it should
> have occurred, including running the postprocessor function.

Not sure it's worth retaining the information about the post-processor
function, and it could to potentially lead to trouble if the function
changed somehow in between (or disppeared). We could instead just run
the leftovers through whatever the restarted config says to do with
files.

Do we even need any other meta data at all in the new scheme? I'm
wondering if we could simplify this all to: "If at open() time, X.log
exists, first rotate it away through the currently configured
postprocessor function". If we did that, we should probably have an
global boolean that allows to choose between that and just overwriting
existing files. The latter would be the default to retain current
command-line behavior, and the cluster framework would enable leftover
recovery.

Hmm, actually, there's a piece of meta that we'll need: the opening
timestamp, so that one can incorporate that into the name of the
rotated file (assuming we want to retain that capability). Unless we
parsed that out of the X.log itself ...

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
Zeek-Dev mailing list
Zeek-Dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev