Re: [Zeek-Dev] Zeek Supervisor Command-Line Client

2020-07-01 Thread Robin Sommer
On Tue, Jun 30, 2020 at 14:29 -0700, Jon Siwek wrote:

> Maybe the important observation is that the logic can be performed
> anywhere that has access to the Zeek-Supervisor process.

Agree.

> So where we put the logic at this point may not be important.  If we
> can find a single-best-place for the logic to live, that's great

I believe that's what Seth is arguing for: have a Zeek-side script be
the single point of that logic, rather than implement it multiple
times and/or outside of Zeek.

I can see doing that in Zeek but I think there's a trade-off here: if
we want to do the singe-place approach with a multi-system setup, we'd
need an authoritative place to run this logic and hence depend on
*that* Zeek supervisor being up and running for performing the
operation. That may be a reasonably assumption (say if we dedicated
the supervisor running the manager to also be the cluster
coordinator), but it's different from a world where the client can
execute higher-level operations on its own.

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
Zeek-Dev mailing list
Zeek-Dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] Zeek Supervisor Command-Line Client

2020-06-30 Thread Jon Siwek
On Tue, Jun 30, 2020 at 6:35 AM Seth Hall  wrote:

> I'm really starting to think that the business logic for
> correctly starting and stopping a cluster should be fully implemented in
> the supervisor script.  The zeekc tool could then just be a dumb tool
> that says to start and stop and doesn't end up causing us to spread our
> logic around to other tooling.

Maybe the important observation is that the logic can be performed
anywhere that has access to the Zeek-Supervisor process.

* The Supervisor process itself would be able to perform the logic via
direct BIF access.

* External processes, like zeekc, have access to a Zeek-event
interface to indirectly access those same BIFs, so they can also
execute equivalent logic (either via multiple events, or a single
"convenience" event that implements a sequence of BIF calls on remote)

When we bring multi-hosting into the mix, it's still a similar
situation, just with beefed up logic for orchestrating
node-type-specific steps across many peers: anyone with access to the
Zeek-event interface could implement this logic.  You could pick zeekc
to orchestrate, or you could pick a single Zeek-Supervisor process to
orchestrate between other Supervisors, or you could pick a regular
Zeek process, or you could write a Python script just using Broker
Python bindings, etc.

So where we put the logic at this point may not be important.  If we
can find a single-best-place for the logic to live, that's great, but
if there's utility for others to have their own
independent-yet-equivalent logic, I don't see a problem with that.

- Jon
___
Zeek-Dev mailing list
Zeek-Dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] Zeek Supervisor Command-Line Client

2020-06-30 Thread Robin Sommer



On Tue, Jun 30, 2020 at 09:35 -0400, I wrote:

> I think that the script we ship with zeek that effectively implements the
> supervisor behavior should understand the business logic of shutting down a
> cluster in the correct order.

How would that then work across multiple systems?

Robin


-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
Zeek-Dev mailing list
Zeek-Dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] Zeek Supervisor Command-Line Client

2020-06-30 Thread Seth Hall
Sorry for chiming in late on this...

On 19 Jun 2020, at 14:46, Jon Siwek wrote:

> Ack, got it and agree that the distinction is likely helpful: the
> supervisor node implements the low-level "dirty work" of stopping
> processes and can ensure shutdown of its entire process tree if it
> really has to, but the client can carry out shutdown logic with a
> higher-level of insight into directing a shutdown process (possibly
> across many hosts) in orderly fashion.

I think that the script we ship with zeek that effectively implements 
the supervisor behavior should understand the business logic of shutting 
down a cluster in the correct order.  One way to think about it is that 
the supervisor script will presumably understand the business logic for 
starting a cluster in the right order so consequently it would seem that 
it should understand how to shut down the cluster as well.

We talked about it recently and now that I've had some more time to 
think about it I'm really starting to think that the business logic for 
correctly starting and stopping a cluster should be fully implemented in 
the supervisor script.  The zeekc tool could then just be a dumb tool 
that says to start and stop and doesn't end up causing us to spread our 
logic around to other tooling.

   .Seth

--
Seth Hall * Corelight, Inc * www.corelight.com
___
Zeek-Dev mailing list
Zeek-Dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] Zeek Supervisor Command-Line Client

2020-06-19 Thread Jon Siwek
On Fri, Jun 19, 2020 at 1:38 AM Robin Sommer  wrote:

> think we also want their state controllable from the client as well,
> so that one can have an orderly shutdown of a multi-system cluster
> without loss of data (e.g., one probably wants to shutdown workers
> first to collect remaining log data). This what I meant above by
> "shutdown the cluster processes": "zeek-client stop" would tell the
> supervisors to shutdown their node processes (or rather: "zeek-client
> stop workers", or maybe "zeek-client" would now the order in which to
> stop nodes or systems).

Ack, got it and agree that the distinction is likely helpful: the
supervisor node implements the low-level "dirty work" of stopping
processes and can ensure shutdown of its entire process tree if it
really has to, but the client can carry out shutdown logic with a
higher-level of insight into directing a shutdown process (possibly
across many hosts) in orderly fashion.

Also, based on "naming" feedback: plan to use `zeekc`.

- Jon
___
Zeek-Dev mailing list
Zeek-Dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] Zeek Supervisor Command-Line Client

2020-06-19 Thread Robin Sommer



On Thu, Jun 18, 2020 at 13:00 -0700, Jon Siwek wrote:

> > For (1), the above applies: we'll rely on standard sysadmin processes
> > for updating. That means you'd use "zeekcl" to shutdown the cluster
> > processes, then run "yum update" (or whatever), then use "zeekcl"
> > again to start things up again.

> I have a slightly different take: isn't it more common to expect
> "start" and "stop" operations here to be done by the service-manager
> rather than Zeek client?

I believe we're pretty close to saying the same thing. I'm making a
distinction between the supervisor Zeek process (which the service
manager starts & stops), and the cluster's node processes (manager,
workers, etc). The supervisor manages the latter and will by default
shut them down when it gets the "stop" from its service-manager. But I
think we also want their state controllable from the client as well,
so that one can have an orderly shutdown of a multi-system cluster
without loss of data (e.g., one probably wants to shutdown workers
first to collect remaining log data). This what I meant above by
"shutdown the cluster processes": "zeek-client stop" would tell the
supervisors to shutdown their node processes (or rather: "zeek-client
stop workers", or maybe "zeek-client" would now the order in which to
stop nodes or systems). And I imagine one would do that before
starting to a cluster-wide upgrade to the next Zeek version.

That said, your note on Slack sounds right: let's figure out the
single-system operation first and get that usable. I'm pretty
confident that we will then be able to build the multi-system model on
top of that without too much trouble, and it'll we easier to collect
requirements for administration/management of multi-system setups once
we got some experience with single-system setups.

Robin


-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
Zeek-Dev mailing list
Zeek-Dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] Zeek Supervisor Command-Line Client

2020-06-18 Thread Johanna Amann
>> Suggestion: `zeekcl`, Zeek (Command-Line) CLlient.
>
> "zeekcl" is very close to "zeekctl", which could lead to confusion.
> "zcl" maybe?
>
>> Is use of Python still desirable for other reasons?  Otherwise, I 
>> lean
>> towards `zeekcl` being C++.
>
> No particular preference from my side, I can see either. Effort is
> probably about the same in this model, and C++ does have the advantage
> of less dependency issues.

I agree - I actually kind of like the idea that zeekcl does not have 
python as a dependency.

>> I plan to have `zeekcl` code/tests live inside the main Zeek repo.
>
> Makes sense to me as well.

Agreed here too.

Johanna
___
Zeek-Dev mailing list
Zeek-Dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] Zeek Supervisor Command-Line Client

2020-06-18 Thread Jon Siwek
On Thu, Jun 18, 2020 at 7:45 AM Vlad Grigorescu  wrote:

> My main concern was Broker version incompatibilities between the 
> newly-installed zcl, and the running cluster, which I think would be 
> addressed by that (i.e. to stop a cluster, you stop the supervisor service on 
> the manager, and then the other services will lose their connection and also 
> stop).

A clarification that may help you: the "orphaning" behavior isn't
related to Broker connections, it's related to the parent-child
relationship between processes.  So there's a process tree here with
`zeek` in supervisor-mode at the root and child processes that are
individual cluster nodes (worker, manager, logger, proxy).

The normal termination behavior for the supervisor process is to
gracefully kill and wait for all children to exit.  In the very
exceptional case of the supervisor exiting/crashing without having
cleaned up all children, those children will self-terminate upon
noticing they are no longer parented to the supervisor.

- Jon

___
Zeek-Dev mailing list
Zeek-Dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] Zeek Supervisor Command-Line Client

2020-06-18 Thread Jon Siwek
On Thu, Jun 18, 2020 at 12:11 AM Robin Sommer  wrote:

> For (1), the above applies: we'll rely on standard sysadmin processes
> for updating. That means you'd use "zeekcl" to shutdown the cluster
> processes, then run "yum update" (or whatever), then use "zeekcl"
> again to start things up again. (The Zeek supervisor will be running
> already at that point, managaged through systemd or whatever you're
> using).

I have a slightly different take: isn't it more common to expect
"start" and "stop" operations here to be done by the service-manager
rather than Zeek client?  I'm assuming "update/deploy Zeek
installation" could involve a change in the `zeek` binary and that
implements the supervisor process itself, so you'd want, at the level
of system services, to stop the entire Zeek process tree, including
the root supervisor.

That doesn't exclude the possibility of the client having operations
like  "start" (spawn `zeek -j `), "stop" (kill the root `zeek`
supervisor process), or even others that dynamically add/remove
cluster nodes from the tree, but that's probably not the
common/expected usage to prioritize since it's again back to model of
the process tree being managed manually by the user, independent from
a system's service-manager.

- Jon
___
Zeek-Dev mailing list
Zeek-Dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] Zeek Supervisor Command-Line Client

2020-06-18 Thread Vlad Grigorescu
Thanks Robin, that helps.

On Thu, Jun 18, 2020 at 2:11 AM Robin Sommer  wrote:

>
> There are two parts here: (1) deploying the Zeek installation itself,
> and (2) deploying any configuration changes (incl. new Zeek scripts).
>
> For (1), the above applies: we'll rely on standard sysadmin processes
> for updating. That means you'd use "zeekcl" to shutdown the cluster
> processes, then run "yum update" (or whatever), then use "zeekcl"
> again to start things up again. (The Zeek supervisor will be running
> already at that point, managaged through systemd or whatever you're
> using).
>
> (2) is still a bit up in the air. With 3.2, there won't be any support
> for distributing configurations automatically, but we could add that
> so that config files/scripts/packages do get copied around over
> Broker. Feedback would be appreciated here: What's better, having
> zeekcl manage that, or leave it to standard sysadmin process as well?
>

I re-read the design doc, and I think that the part I missed the first time
through was suicide on orphaning. (Side-note: Given the much-needed trend
towards bias-free terminology in technology, perhaps there's a better term
here). My main concern was Broker version incompatibilities between the
newly-installed zcl, and the running cluster, which I think would be
addressed by that (i.e. to stop a cluster, you stop the supervisor service
on the manager, and then the other services will lose their connection and
also stop).

I'm still a bit unclear on how to start a cluster. In my mind, where simply
using the standard process/job control falls short is the need to operate
across multiple physical systems. So, would that be a job for zcl? Or would
the desired goal be that I have my, say, systemd unit set to constantly be
restarting Zeek on my worker systems? If it can't connect to the manager,
it would presumably immediately die given the orphaned state.

The more tightly we couple the nodes together, the more quickly it'll
detect failures, but the more sensitive it will be to flapping and
unnecessary restarts. The cluster is relatively fragile right now (e.g. a
manager node going away even for a brief period of time tends to lead to a
crash, as on an even relatively busy system, as the backlog won't clear as
timers and other events stack up). So I think that if we're moving cluster
supervision out of a parallel process in `zeekctl cron` and into Zeek
itself, we'll need to improve error detection and graceful recovery where
possible.

  --Vlad
___
Zeek-Dev mailing list
Zeek-Dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] Zeek Supervisor Command-Line Client

2020-06-18 Thread Robin Sommer



On Thu, Jun 18, 2020 at 03:32 +, Vlad Grigorescu wrote:

> As a concrete example, what does a cluster upgrade look like?

The idea is to handle this more like other system services: you'll be
in charge of getting the new Zeek version onto all your systems
yourself, using whatever method you use for other software as well.
For example, if you're installing through a package manager, you'd
just run "update" on all systems. If you're installing from source,
you'll either need to compile on each system, or copy the installation
over manually.

The underlying assumption is that people will already have a mechanism
in place for administration of their systems, and we shouldn't be
trying to reinvent the wheel, as ZeekControl oddly does. From a
sysadmin perspective, ZeekControl is really doing a lot more right now
that it should be doing; other tools don't work that way. We don't
want it look like an APT anymore (https://github.com/zeek/zeek/issues/259). :-)

> Today, that means install the new version on the manager, and then do
> `zeekctl deploy`, which copies the files to the nodes and restarts the
> cluster. All of that is done without Broker.

There are two parts here: (1) deploying the Zeek installation itself,
and (2) deploying any configuration changes (incl. new Zeek scripts).

For (1), the above applies: we'll rely on standard sysadmin processes
for updating. That means you'd use "zeekcl" to shutdown the cluster
processes, then run "yum update" (or whatever), then use "zeekcl"
again to start things up again. (The Zeek supervisor will be running
already at that point, managaged through systemd or whatever you're
using).
 
(2) is still a bit up in the air. With 3.2, there won't be any support
for distributing configurations automatically, but we could add that
so that config files/scripts/packages do get copied around over
Broker. Feedback would be appreciated here: What's better, having
zeekcl manage that, or leave it to standard sysadmin process as well?

> Reading the script linked in [2], I notice that zeekcl would not support
> copying files from one node to another?

Correct right now, (2) may or may not change that.

> zeekctl print

"print" will be supported (roadmap says not in 3.2 yet, but it should
be easy to do, maybe we can get it in still).

> zeekctl exec.

"exec" will likely not be supported. We *could* support it, no
technical reason for not doing that over Broker. It just s seems like
another things that's better handled with different tools.

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
Zeek-Dev mailing list
Zeek-Dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] Zeek Supervisor Command-Line Client

2020-06-18 Thread Robin Sommer
> Suggestion: `zeekcl`, Zeek (Command-Line) CLlient.

"zeekcl" is very close to "zeekctl", which could lead to confusion.
"zcl" maybe?

> Is use of Python still desirable for other reasons?  Otherwise, I lean
> towards `zeekcl` being C++.

No particular preference from my side, I can see either. Effort is
probably about the same in this model, and C++ does have the advantage
of less dependency issues.

> Zeek's scripting language (e.g. `ctl.zeek`), but I don't suggest that

Ack, agree.

> I plan to have `zeekcl` code/tests live inside the main Zeek repo.

Makes sense to me as well.

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
Zeek-Dev mailing list
Zeek-Dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] Zeek Supervisor Command-Line Client

2020-06-17 Thread Vlad Grigorescu
I'm still fuzzy on the Supervisor framework, as we're still in the process
of upgrading systems to the point of supporting the new C++ requirements.

As a concrete example, what does a cluster upgrade look like? Today, that
means install the new version on the manager, and then do `zeekctl deploy`,
which copies the files to the nodes and restarts the cluster. All of that
is done without Broker.

What does that look like with zeekcl + Broker? Let's say I install the new
version on the manager. If I then tell zeekcl to destroy the running
instance, will that work, or will the newer zeekcl be incompatible with the
Broker version of the running Zeek?

Reading the script linked in [2], I notice that zeekcl would not support
copying files from one node to another? Other features that would be
missing that we routinely use are `zeekctl print` and `zeekctl exec`. I'm
assuming `zeekcl` would be running in some uber-bare mode if it's written
in Zeek?

  --Vlad

On Thu, Jun 18, 2020 at 2:15 AM Jon Siwek  wrote:

> Don't recall any basic "project infrastructure" discussions happening
> yet for the upcoming replacement/alternative for ZeekControl that we
> want to introduce in Zeek 3.2 (roadmap/design links found at [1]), so
> here's starting questions.
>
> # What to Name It ?
>
> Suggestion: `zeekcl`, Zeek (Command-Line) CLlient.
>
> Open to ideas, but will use `zeekcl` below.
>
> # What Programming Language ?
>
> `zeekcl` has different/narrower scope than ZeekControl.  It's more
> clearly a "client" with sole job of handling requests/responses via
> Broker without many (any?) system-level operations/integrations.
> Meaning there may be less of an approachability/convenience gap
> between C++ versus Python with `zeekcl` than there was with
> ZeekControl.
>
> Also nice if `zeekcl` doesn't require more dependencies beyond what
> `zeek` needs since they're expected to be used together.
>
> Is use of Python still desirable for other reasons?  Otherwise, I lean
> towards `zeekcl` being C++.
>
> For reference/sanity-check in terms of what people expect `zeekcl` to
> be: in my testing of the SupervisorControl framework [2], I had a
> sloppy Zeek script implementing the full "client side" (essentially
> the majority of what `zeekcl` will do) in ~100 LOC.  Most operations
> are that simple: send request and display response.
>
> That does mean the third option to consider besides either Python or
> C++ is Zeek's scripting language (e.g. `ctl.zeek`), but I don't
> suggest that since (1) using a full `zeek` process is way more than we
> need and (2) the command-line interface is awkward (`zeek ctl
> Supervisor::cmd="status"` versus `zeekcl status`)
>
> # Where's the Source Code Live ?
>
> Past experiences with ZeekControl being in a separate repo than Zeek
> are negative in terms of CI/testing: changes in Zeek have broken
> ZeekControl, but go uncaught for a while since it is tested
> independently.
>
> Since common use/maintenance will involve both `zeek` and `zeekcl`,
> and also don't expect the later to accrue large amounts of code
> deserving of a separate project, I plan to have `zeekcl` code/tests
> live inside the main Zeek repo.
>
> - Jon
>
> [1] https://github.com/zeek/zeek/issues/582
> [2]
> https://github.com/zeek/zeek/blob/689a242836092fba7818ba24724b74a7a7902e48/scripts/base/frameworks/supervisor/control.zeek
> ___
> Zeek-Dev mailing list
> Zeek-Dev@zeek.org
> http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev
>
___
Zeek-Dev mailing list
Zeek-Dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev