[Zeek-Dev] Re: Cluster Controller Framework Thoughts

2020-09-08 Thread Vlad Grigorescu
Thanks, Robin. Your comments clarified some things, and were overall very
helpful.

The main thing I somehow missed originally is that the plan is to enable
multiple deployment models, while at the same time making it as easy as
possible to get up and running. I was concerned that we were straying too
far afield from the supported model, which we try to avoid, especially
since anything that we might share with the community is no longer
applicable/useful.

There were a couple of places where the Ansible/systemd approach didn't
work well out of the box, due to assumptions built into the supervisor
framework. For instance, it's assumed that if a process is running as a
cluster node, it's supervised, which has some undesirable implications. I
can open up some issues for those couple of instances, to track finding a
better way to do those specific things.

Otherwise, we're working on spinning up a cluster that we'd like to use to
test/develop some HA capabilities, and will report back.

  --Vlad

On Thu, Sep 3, 2020 at 4:56 PM Robin Sommer  wrote:

>
> Hi Vlad,
>
> thanks for the feedback, that's quite helpful. I'll dig a bit into
> some of your points below.
>
> As a general point, there's nothing wrong with chosing a different
> deployment model than whatever becomes the new default. Quite the
> opposite: Part of the thinking here has been that there's no single
> approach that'll work for everybody, hence we want to offer multiple
> layers that people can hook into depending on their needs and
> expertise. The Controller would be the highest level abstraction that
> gives you an experience not too far from current ZeekControl (more on
> that below). On the other end of the spectrum, skipping everything
> altogether and going with a manual systemd config is the lowest-level
> way of doing it. In between those two we have: using the Supervisor
> API through a custom Zeek management script (i.e., no Cluster
> Agent/Controller), and writing a custom controller to interface with
> the Agent API while doing your own state management.
>
> > Ultimately, given the choice between systemd + supervisor versus just
> > systemd, for our use case, just systemd gave us some distinct benefits
> and
> > reduced complexity.
>
> Ack, I can see that for you guys, especially with the current state of
> things. I'll just add here that (1) not everybody can/want to use
> systemd, so we'll need to have ways to build clusters in other
> settings; and (2) some of the technical advantages you mention should
> be addressable with the Supervisor/Controller, too; that's just not
> there yet (e.g., nicer process visualization).
>
> > it feels like I'm left with a choice between one orchestration tool +
> > the cluster controller framework, versus just using a single
> > orchestration tool to keep these files in sync and handle cluster
> > stop/start/restarts.
>
> I think part of the question here is how much effort one is willing to
> invest into installing and maintaining Zeek. If you (1) are very
> familiar with Zeek, and (2) have a current orchestration tool in place
> that's easy to extend with all the necessary pieces (incl. managing of
> restarts, logging, health monitoring), I agree that one tool sounds
> better than two. However, if we look at it from the perspective of a
> new user who wants to get Zeek running on their network quickly,
> figuring out all those pieces is probably quite a hurdle. That
> trade-off seems similar to ZeekControl today: people already have the
> option to go through systemd, but ZeekControl remains the standard way
> to run Zeek, even with all its quirks.
>
> Re/ putting files in place everywhere: Per the design doc, I
> definitely see distribution of packages and site-specific scripts in
> scope for future versions of the Controller. That would then leave
> people with the task to just install the same Zeek version everywhere,
> which seems a reasonable expectation to me.
>
> > If I need to reboot a system in a cluster, and it's running the manager
> and
> > logger, I'd like to see another system in the cluster get promoted to
> being
> > the manager and logger, and all the nodes to start talking to that
> instead.
>
> I would like to see that, too. :-) However, this seems to be quite a
> different thing than the systemd approach you are describing. How
> would such a dynamic scheme operate without some kind of control layer
> in between doing the coordination? In some future version, the Cluster
> Controller would be the management component that can initiate changes
> like dynamic fall-over. We can argue about whether that control layer
> should be a central component (as the Controller proposes) vs some
> distributed consensus scheme; and also whether we should really
> implement this ourselves or rather go with some 3rd party tool for
> coordination. But either way, I think something needs to be there.
>
> > it feels a bit like the Cluster Controller framework is trying to take
> > the old 

[Zeek-Dev] Re: Cluster Controller Framework Thoughts

2020-09-03 Thread Robin Sommer


Hi Vlad,

thanks for the feedback, that's quite helpful. I'll dig a bit into
some of your points below.

As a general point, there's nothing wrong with chosing a different
deployment model than whatever becomes the new default. Quite the
opposite: Part of the thinking here has been that there's no single
approach that'll work for everybody, hence we want to offer multiple
layers that people can hook into depending on their needs and
expertise. The Controller would be the highest level abstraction that
gives you an experience not too far from current ZeekControl (more on
that below). On the other end of the spectrum, skipping everything
altogether and going with a manual systemd config is the lowest-level
way of doing it. In between those two we have: using the Supervisor
API through a custom Zeek management script (i.e., no Cluster
Agent/Controller), and writing a custom controller to interface with
the Agent API while doing your own state management. 

> Ultimately, given the choice between systemd + supervisor versus just
> systemd, for our use case, just systemd gave us some distinct benefits and
> reduced complexity.

Ack, I can see that for you guys, especially with the current state of
things. I'll just add here that (1) not everybody can/want to use
systemd, so we'll need to have ways to build clusters in other
settings; and (2) some of the technical advantages you mention should
be addressable with the Supervisor/Controller, too; that's just not
there yet (e.g., nicer process visualization).

> it feels like I'm left with a choice between one orchestration tool +
> the cluster controller framework, versus just using a single
> orchestration tool to keep these files in sync and handle cluster
> stop/start/restarts.

I think part of the question here is how much effort one is willing to
invest into installing and maintaining Zeek. If you (1) are very
familiar with Zeek, and (2) have a current orchestration tool in place
that's easy to extend with all the necessary pieces (incl. managing of
restarts, logging, health monitoring), I agree that one tool sounds
better than two. However, if we look at it from the perspective of a
new user who wants to get Zeek running on their network quickly,
figuring out all those pieces is probably quite a hurdle. That
trade-off seems similar to ZeekControl today: people already have the
option to go through systemd, but ZeekControl remains the standard way
to run Zeek, even with all its quirks.

Re/ putting files in place everywhere: Per the design doc, I
definitely see distribution of packages and site-specific scripts in
scope for future versions of the Controller. That would then leave
people with the task to just install the same Zeek version everywhere,
which seems a reasonable expectation to me.

> If I need to reboot a system in a cluster, and it's running the manager and
> logger, I'd like to see another system in the cluster get promoted to being
> the manager and logger, and all the nodes to start talking to that instead.

I would like to see that, too. :-) However, this seems to be quite a
different thing than the systemd approach you are describing. How
would such a dynamic scheme operate without some kind of control layer
in between doing the coordination? In some future version, the Cluster
Controller would be the management component that can initiate changes
like dynamic fall-over. We can argue about whether that control layer
should be a central component (as the Controller proposes) vs some
distributed consensus scheme; and also whether we should really
implement this ourselves or rather go with some 3rd party tool for
coordination. But either way, I think something needs to be there.

> it feels a bit like the Cluster Controller framework is trying to take
> the old zeekctl features and get them to fit into a new model.

The proposal for the Supervisor/Controller model has been out for a
while, and the main point of feedback so far has been from folks who
wanted to ensure that we don't loose functionality that ZeekControl
offers today. So yes, that has been a starting point for fleshing out
a bunch of this: Can we retain what people like about ZeekControl, but
move it over into a new architecture that removes what they don't like
(e.g., copying binaries around)--and all that while facilitating a
more dynamic future world that increases Zeek's flexibility and
resilience. I'm not saying that the current Controller design achieves
all that already, but it has indeed been designed as an incremental
path forward rather than a lets-redo-it-from-scratch approach. Happy
to discuss if that's the right trade-off.

Robin

--
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
zeek-dev mailing list -- zeek-dev@lists.zeek.org
To unsubscribe send an email to zeek-dev-le...@lists.zeek.org