Re: [Zeek-Dev] Hi + LL Analyzer

2019-02-27 Thread Robin Sommer



On Wed, Feb 27, 2019 at 16:07 +0100, Jan Grashöfer wrote:

> At first glance it looks like IP-layer multiplexing is done in
> NetSessions::{NextPacket, DoNextPacket} and the Transport-layer is tackled
> in Manager::BuildInitialAnalyzerTree in context of initializing a
> connection.

Well, there, too. :) That's indeed doing the packet dispatching, while
DoNextPacket() sets up state mgmt. It's all not quite clear cut, which
is part of the problem.

> That is the central point. So a first step would be to rely on TCP/IP in the
> "middle" of the stack but allow pluggable Link-layer protocols. Those might
> feed their data to the TCP/IP pipeline or handle them on their own. The next
> step would be the IP-layer.

Yeah, that sounds good to me.

> One question here would be whether it makes sense to assume that the set of
> LL-analyzers tash should be available is known at compile-time?

The built-in ones can be known, but any added through dynamic plugins
can't really. We'll know only at runtime what the final set is. But we
could precompute a lookup table in advance at startup that maps link
types to analyzers.

> I think this would be part of the larger effort to re-think Zeek's notion of
> connections. This could be addressed together with implementing a flexible
> mechanism to make meta data like LL-addresses available in context of a
> connection.

Yep.

> In case we allow to plug in new transport protocols, they might need
> their own PIA to support the analysis of known protocols like HTTP
> etc.

Yeah, or a more generic PIA that provides its own hook for plugins.
The main difference between TCP/UDP PIAs is packet vs stream
semantics, iirc. That might generalize sufficiently, but not sure.

Robin

-- 
Robin Sommer * Corelight, Inc. * ro...@corelight.com * www.corelight.com
___
zeek-dev mailing list
zeek-dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev


Re: [Zeek-Dev] Hi + LL Analyzer

2019-02-27 Thread Jan Grashöfer
On 26/02/2019 02:36, Robin Sommer wrote:
> I see three pieces here overall that I think can be tackled
> independently:
> 
> (1) Link-layer: Currently hardcoded in Packet::ProcessLayer2()
> 
> (2) IP-Layer: Currently hardcoded in NetSessions::NextPacket()
> 
> (3) Transport-layer: Currently hardcoded in NetSessions::DoNextPacket().

At first glance it looks like IP-layer multiplexing is done in 
NetSessions::{NextPacket, DoNextPacket} and the Transport-layer is 
tackled in Manager::BuildInitialAnalyzerTree in context of initializing 
a connection.

> Case (1) is all about skipping the header to get to IP. There's some
> redundancy across cases, though, and MPLS makes it all more messy.

One thing that comes to my mind here is whether it might be possible to 
pass information such as VLAN tags, MPLS labels or link layer addresses 
to upper layers in a generic way without hardcoding. However, that might 
be out of scope for now.

> With (2), a plugin would be able to add support for non-IP protocols.
> However, due to Bro generally assuming that it is analyzing IP, the
> plugin would either need to take care of such packets completely (like
> ARP does), or eventually get to an IP packet that it can then feed
> back for further analysis (like if it some kind of a tunnel).

The non-IP packet might also contain a Transport-layer PDU. I guess it 
should be possible to pass these on as well.

> There's also a more general version of (2) and (3) where we'd remove
> Bro's assumption of analyzing TCP/IP protocols. But that's a separate,
> large effort by itself.

That is the central point. So a first step would be to rely on TCP/IP in 
the "middle" of the stack but allow pluggable Link-layer protocols. 
Those might feed their data to the TCP/IP pipeline or handle them on 
their own. The next step would be the IP-layer.

> On a technical level, plugging in such low-level analyzers needs to be
> very efficient, in particular if we move the currently hardcoded cases
> into the plugins as well (as I think we should; similar to how
> application-layer analyzers have all moved into internal plugins).
> Then the lookup-the-analyzer-and-dispatch operation will happen
> multiple times for every packet.

One question here would be whether it makes sense to assume that the set 
of LL-analyzers tash should be available is known at compile-time?

>> - What about the concept of connections? For some LL protocols the
>> concept might be counterintuitive.
> 
> Couple cases there:
> 
> - If there's really no sense of a connection, then the plugin will
>need to take complete care of the packets, as the rest of Bro
>assumes connection-semantics.

Maybe there is another general abstraction that is worth to be supported 
as well. I was thinking of request-reply-pairs that can be correlated. 
However, I haven't put much thought into this, yet.

> - If it's just the definition of what defines a connection that is
>different, then I think we could make that more flexible. I've been
>hoping for a while that we can make Bro's notion of connection IDs
>dynamic, so that it's not necessarily just the 5-tuple. There are
>use cases outside of new protocols for this, too. For example, one
>could include the VLAN ID to deal with overlapping IP ranges in
>independent VLANs.

I think this would be part of the larger effort to re-think Zeek's 
notion of connections. This could be addressed together with 
implementing a flexible mechanism to make meta data like LL-addresses 
available in context of a connection.

>> - The interface should support to pass payload to other analyzers. Does
>> it make sense to come up with a generalized DPD-mechanism?
> 
> Not quite sure what you're thinking here, but I believe that fully
> solving this would require addressing Bro's overall assumption of
> analyzing TCP/IP. For now, maybe the best way would be just having the
> analyzer call back into entry points corresponding to the various
> layers where analysis would then proceed as normal. I.e., some
> variation of: ProcessLinkLayer(...), ProcessIP(...),
> ProcessTransport(data), ProcessAppLayer(...). The caller would be
> responsible for providing all the right (meta-)data, like IP headers.
> Were you thinking something different / more general?

While I haven't looked into it, I noticed that there are distinct PIA 
implementations for TCP and UDP. In case we allow to plug in new 
transport protocols, they might need their own PIA to support the 
analysis of known protocols like HTTP etc. However, if we keep a focus 
on TCP/IP as suggested that would be out of scope for now.

Jan
___
zeek-dev mailing list
zeek-dev@zeek.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev