Re: [Wireshark-dev] RFD: New language to write dissectors

Guy Harris Sat, 14 Jul 2012 15:32:19 -0700

On Jul 14, 2012, at 8:26 AM, Jakub Zawadzki wrote:

> It'd be great if we have some abstract and pure (no C/assembly inline) 
> language to write dissectors.


Or "to describe protocols and the way packets for those protocols are 
displayed" - the languages in question wouldn't be as procedural as C/Lua/etc, 
they'd be more descriptive.

> We could invent yet another protocol desciption language,

...but, as you suggest, we probably shouldn't.

> but I was thinking to base grammar on netmon NPL [1] or wsgd [2].

Those are probably the two best choices.

I'm not sure it has to be a choice, though - we could implement both, resources 
permitting, of course.  (And, of course, given that there are many 
already-existing languages that describe protocols - ASN.1, {OSF IDL/MIDL/PIDL} 
for DCE RPC, rpcgen for ONC RPC, CORBA IDL, xcb for X11 - we will probably 
never have the One True Protocol Description Language.)

> I'm bigger fan of NPL (sorry Olivier), nmparsers project has got large 
> collection of dissectors[3] 
> which we could use (LLTD - bug #6071, Windows USB Port packets - bug #6520, 
> netsh - bug #6694)
> but there might exists some legal (patents for grammar/implementation?!) 
> issues.

That would be one concern - even having "our own" language, such as wsgd, runs 
the risk of infringing a patent, but, well, *writing software of just about any 
sort* runs the risk of infringing a patent; however, we're dealing with a large 
corporation in the case of NPL, so there's probably a greater risk that some or 
all of it is covered by patents.  Were Microsoft to explicitly state that there 
are no patents on NPL-the-language or that they're granting a royalty-free 
license for all implementations (perhaps with a "mutual assured destruction" 
clause, so that were we to patent some feature of Wireshark and sue Microsoft 
for violating that patent, our license for their patents would terminate), and 
the same applied to any patents they hold on their implementation of NPL that 
would block independent useful implementations, that might help.

> With wsgd we could reuse some existing code of plugin.

...and we also have more freedom to extend the language, e.g. to support 
preferences for a protocol - Paul Long's blog post says

> A common problem: “No silly, we do HTTP traffic on port 8888, not 80 or 8080!”
>  
> While changing port mappings for protocols could be something revealed in the 
> user interface, we haven’t gotten that far in Network Monitor 3.0 yet.  I 
> expect we should address this specific problem on different fronts, i.e. a UI 
> for each protocol, and some way to handle dynamic port allocations.  And 
> there are also some heuristics we can use to identify protocols as well.  But 
> today, there is a fairly simple way to modify the NPL script for protocols on 
> non-standard ports.

I don't know whether, as of 3.4, they support "a UI for each protocol, and some 
way to handle dynamic port allocations", but we already have the infrastructure 
for that.

NPL also, for strings, offers 3 encodings - to quote the help manual:

> This data type extracts a specified number of characters from a sequence of 
> bytes. The characters can be UTF-16, UTF-8, or ASCII, depending on the 
> encoding specified.

There's no mention of the Extended Binary-Coded Decimal Interchange Code there, 
but we have several dissectors using ENC_EBCDIC, so that would be another place 
where we might want to extend NPL were we to use it.

Were there an "Open NPL Consortium" of some sort where multiple implementers of 
NPL could propose extensions, and perhaps a way an implementation could offer 
private extensions without worrying about colliding with other implementations 
or future standards, that might help.

Note, by the way, that having a language of this sort could allow something 
such as this.

Consider a protocol with the following description (in a C-like protocol 
description language that I'm making up on the fly):

        enum message_type {
                Login = 0,
                Logout = 1,
                Request = 2,
                Response = 3
        };

        struct login {
                ascii string username[16];
                ascii string password[16];
        };

        struct request {
                uint32 bigendian requested_item;
        };

        struct response {
                uint32 bigendian value_size;
                uint8 value[value_size];
        };

        struct request {
        protocol foo {
                uint32 bigendian enum message_type type;
                switch (type) {

                case Login:
                        struct login login;

                case Logout:
                        /* logout message has only a type */

                case Request:
                        struct request request;

                case Response:
                        struct response response;
                }
                uint32 bigendian message_id;
        };

which might translate to (in a pseudo-machine language I'm also making up on 
the fly):

        uint32 bigendian foo.type saveas x
        switch x:
                0       Login
                1       Logout
                2       Request
                3       Response
        Login:
                ascii string 16 foo.login.username
                ascii string 16 foo.login.password
                goto end
        Logout:
                goto end
        Request:
                uint32 bigendian foo.request.requested_item
                goto end
        Response:
                uint32 bigendian foo.response.value_size saveas y
                uint8 array y foo.response.value
                goto end
        end:
                uint32 bigendian foo.message_id

Now consider a dissection pass being done for a display filter "foo.message_id 
== 0x4073".  That full "compiled" program is overkill; that dissection pass 
might optimize it into

        uint32 bigendian foo.type saveas x
        switch x:
                0       Login
                1       Logout
                2       Request
                3       Response
        Login:
                skipbytes 32
                goto end
        Logout:
                goto end
        Request:
                skipbytes 4
                goto end
        Response:
                uint32 bigendian foo.response.value_size saveas y
                skipbytes y
                goto end
        end:
                uint32 bigendian foo.message_id

and, for that dissection pass, run that optimized version of the dissection 
"machine code" for the foo protocol, and similarly optimized versions of the 
dissection code.  The optimized versions of the dissection "machine code" might 
be generated as needed (rather than generating optimized versions for every 
protocol, just generate them from the base code the first time we try to run 
the code) and cached with the cache key being the set of fields in which the 
dissection in question was interested (whether because they're being used in a 
filter or for a column or in "-e {field}" in TShark or...).

This would allow us to get some of the effect of

        if (tree) {
                ...
        }

without leaving it up to humans to get it right (which humans often don't), and 
allow us to do more such optimization as well (as it's not just "do I need a 
protocol tree?", it's "do I need anything other than these few fields and 
whatever fields are necessary to get at those fields").

(It also raises the question of whether interpreted execution of that "machine 
code" or translation to C or machine language will be faster - interpreted 
execution *could* result in a smaller cache footprint if the interpreter is 
small enough and the code "high-level" enough to be fairly dense, although it 
does involve difficult-at-best-to-predict branches in the interpretive loop.)

Of course, this would allow people to extend Wireshark without needing any C 
developer tools, and would reduce the need for stability in the dissector core 
code.  Translating to a "machine code" of the sort shown above might also 
significantly reduce compile time (maybe with support for the CORBA IDL, 
building Parlay support won't dim the lights :-)), and if those are all loaded 
at startup time, it might make it easier to build configurations of Wireshark 
that don't have Every Single Protocol Known To Man and that thus start up more 
quickly.

On the other hand, it might also allow protocol descriptions to be shipped 
either in source form or binary form with restrictions on redistribution, 
providing a way to "get around the GPL" for protocols.  Some might consider 
that a feature (I seem to remember many years ago Cisco raised this issue about 
some protocols) and others might consider it a bug.  If we end up with a 
consensus of "it's a bug", we might be able to extend the protections of the 
GPL to dissector descriptions fed to the interpreter, so that if you make a 
"compiled" protocol description available, you must also make the source 
available to recipients and must give recipients the right to redistribute the 
source or binaries.
___________________________________________________________________________
Sent via:    Wireshark-dev mailing list <wireshark-dev@wireshark.org>
Archives:    http://www.wireshark.org/lists/wireshark-dev
Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
             mailto:wireshark-dev-requ...@wireshark.org?subject=unsubscribe

Re: [Wireshark-dev] RFD: New language to write dissectors

Reply via email to