Re: -protocol-03

Devin Kowatch Fri, 20 Feb 2004 16:49:27 -0800

I should make the disclaimer that I haven't read the latest draft, sorry
Rainer, as time has not been available.

On Thu, Feb 19, 2004 at 05:55:46PM +0100, Rainer Gerhards wrote:
[ snip ]
>
> > 2. You talk a lot about server parsing the syslog message.
> > Are we making
> > an assumption that receiver must parse messages (for example 5th
> > paragraph in 4.1.1)?
>
> I think you make a point here that I missed receivers that are not
> interested at all in parsing a message. However, this case should be
> very, very seldom because without parsing, there is nothing else that
> you can do to the message - because you have just a bunch of octets
> without semantics. OK, you can use this to write a raw log, but that's
> it. To assign any meaningfullness to the message, you MUST parse it - no
> way around it.

My experience is on the unix side of things, so the following may not
be true, or as true, for other platforms.

I'm wondering what you mean by "meaningfullness".  Having worked on the
odd syslog analyser or two, syslog messages are nearly impossible to
extract meaning from.  To clarify, because the actual meaning of a
syslog message (e.g. su to root failed) is in free form text (and there
is nothing that we can do about that) extracting the meaning involves
either matching the text to known patterns or natural language
processing.  Everything other than the message is really meta-data (the
time it was logged, where it come from, path information, etc).  The
syslog server really needs to know very little of the meta-data to do
it's job.

Syslog servers have existed for years with minimal parsing of the
message.  My own syslog server does parse the message, but I think (in
retrospect) that may have been a mistake.  The only useful place where
parsing must occure is when you are translating from one format to
another (e.g. classic syslog to SYSLOG/COOKED).

BTW, Reading section 4.1.1, I think you may be too specific about what a
server MUST/SHOULD (NOT) do with the version string.  I think it might
be enough to state that this document describes version 2, and an
unknown version should cause the reciever to log a message.  The last 3
paragraphs seem unecessary.  Also, I think they specify too much about
how the protocol (and application that uses it) should be written.
>
> I can add a section for non-parsing receivers, but that won't reduce the
> size of the draft ;) I am not sure if it pays to specify this very
> uncommon case.
>
> > Generally, there seems to be a lot of stuff
> > dictating behavior for receiver. Why can't we just say: this
> > is valid on
> > the wire, this is not and leave it up to implementation to decide what
> > it does with it.
>
> See my intro ;)
>
>
[ snip ]
> >
> > 4. Section 4.1.3.  "Any implementation MUST support free configuration
> > of the FACILITY on the sender."  I think by implementation you are
> > always assuming a dedicated sender or receiver library
> > product.  I don't
> > see why I can't just implement sending logic in my app
> > directly and not
> > have a fixed facility.  I think at best, this is a SHOULD.
>
> I agree that a SHOULD is more appropriate here, we can not actually
> enforce this. It's neither a protocol nor a security issue if it is not
> configurable. It can be a big backdraw for the operator, but they can
> choose to dictate this by not purchasing products which don't support
> free configuration. Agree, it MUST be a SHOULD ;)

Also note that on the unix side of the house, facilities are defined by
in a header file.  Furthermore, the actual library calls which log
messages will be sending the classic format, probably for years to come,
which combines the facility and priority.

>
> But I think it is a strong SHOULD, not a weak one. Because I have seen
> soooo much troubles in the real world out of the inability of some
> products to provide free configuration.
>
> >
> > 5. Section 4.1.6 - Hostname.  So, we specify FQDN and if to
> > present IP.
> > Not sure if we had a discussion on this, but did we decide to bypass
> > hostname?
>
> Yes, because it is not meaningful in most cases. I think this is the
> best pointer to previous discussion:
> http://www.syslog.cc/ietf/autoarc/msg00715.html.

Yes and no.  Syslog is usually only used within an organization.  Which
means that in many cases the hostname is enough information to find the
actual hardware that sent the message.  For larger organizations who use
sub-domains and other more complicated setups, the FQDN is more
important.

>
> > I think it will be a common case where hostname is present,
> > but machine does not know its domain suffix. I would
> > generally prefer IP
> > unless it is dynamic (DHCP).
>
> OK, I see the point.

I can make an argument for cases where the hostname is more useful than
even an static IP.  For example, think of a multihomed host.  Or think
of a web farm which has many (10+) virtual interfaces for apache
daemons.  However, leaving this as a SHOULD is good because then
implementations can simply make it configurable.

>
> Probably it is best to ask the sender to provide
> a) FQDN
> b) static IP
> c) hostname only (if on dynamic ip)
> d) dynamic IP
> e) oops... what if it knows nothing? Well, it should, so this is a
> no-case. Right? Or is it worth another paragraph (like puting
> "DumbDevice" or "127.0.0.1" into the hostname)?
>
> Of course, this is a sequence of SHOULDs, not MUSTs.

No, e is not needed.  I think that it's reasonable to assume that a) any
host sending syslog will have an IP address and b) all platforms allow
an application to get that IP address.

[ snip ]
>
> >
> > 8. Section 6.2.3.  I don't think you explain the purpose for allowing
> > partcount to grow.  I assume this is for streaming.  Needs to be
> > explained.  I also think it is a strange scheme.  Why don't you allow
> > incrementing it by one every time?
[ snip ]

I'm just going to take this opportunity to throw out a few general
comments I have about the fragmentation issue.

Maybe it's just me, but it seems like the protocol is getting more
complex than needed.  I understand, and agree with, the need to send
multi-part syslog messages in the event that the message exceeds the
maximum message size.  However, I don't think that there is a lot of
advantage to some of the directions that this is going in.

I'm not trying to pick at a specific issue here, but more to urge that
everyone take a step back and see how they can simplify the protocol.
Remember the saying about when software is done: It's not done until
there are no more things that you can _remove_.  I also don't mean this
to be diminishing the hard work that everyone has put into the syslog
protocol so far.  Often when I design software I go through the same
process, with the first draft overly complex and simplifing it later.
Ultimatly, the protocol will be stronger for process (more cases
considered).

On the fragmentation issue specifically, do we really need to fragment
packets when they would be larger than the MTU of the transport
protocol?  For -protocol, which I thought was transport independent, it
makes no sense.  For the UDP transport, why not just frame the message
and let the IP layer take care of MTU fragmentation.

The other aspect to consider, UDP is not the only transport (please
repeat that after me).  Not only is UDP not the only transport, but it
is not even a good transport (too lossy, yes even with low
messages/sec.).   Oddly enough it turns out to be slower than TCP
(according to someone else's tests) becaue the lossage causes the
messages/sec to level off well before the bandwidth is consumed.

An example would be syslog over BEEP (similar to SYSLOG/RAW).  There is no
good reason to restrict the maximum message size, there is no need to
frame the message, and there is no need to handle fragments in the
message.  All of that is handled at the BEEP layer, as is authentication
of sender/reciever, and more.  It might be a useful exercise for someone
to revisit SYSLOG/RAW and adapt -protocol to run over BEEP.  It may
provide some insights as to how many aspects of -protocol are designed
to get around UDPs limitations.

These are intended to be examples, I may have more specific comments on
this once I read the drafts (this weekend hopefully).  But in short, I
urge everyone to think about how to _simply_ the protocol.  A overly
complex protocol gets less takers on implementation.

Then again, maybe I'm wrong and all the features do need to be present.

> >
> > 9. Section 7.2.  Can we use "yes" or "no" instead of "0" and "1"?
>
> I, too, thought about this. In respect to the message size limitation, I
> decided to go for the shorter form. "0" and "1" should be fairly clear
> as boolean indicator. But I will gladly change this if that is the
> concensus on this list.

Might I suggest 'T' or 'F' (true or false) or 'Y' or 'N'?  just a
thought.
>
> >
> > 10. Section 8.1. If relay can't add structured data elements, it can't
> > record source IP of the message.  I think we should not lose such
> > information. Also, need to allow for recording of time or original
> > reception.
>
> That's a tough issue. It will cause us the loss of digital signatures?
> Which goal has the higher priority? Or should we work around this issue?
> Of course it's doable, but only on the expense of growing the spec. We
> would "simply" need to define a way that a relay can add a container
> structured data element, which in turn could contain other structured
> data elements - and that are clearly flagged as being not part of the
> original message so that a signature verifier could remove that part.
> This also sounds like a call, again, for XML ;)
>
> It's doable, but it will add considerable complexity.
>
> On the other hand, we can allow a relay to break signatures. I am not
> sure if that is a good mode.
>
> Or we do not allow it to modify the message (as currently specified),
> but then we are not able to save the information that you request - and
> I agree this information is valuable...

This should be easy, but you need to encapsulate.  The realy MUST NOT
alter the message it recived in any way (for signatures).  But, to track
the path of the message you could encapsulate the message recieved by
the relay (M) in a new message that contains PRIORITY RELAY-ADDED-HEADERS M.

Note that the priority is duplicated because it is most often used to
route the message.

Simple enough, no?  Now the real question is: would it work? :)

>
> >
> > 11. Section 8.3. You allow relay to break message into multiple parts.
> > What happens with a message that is already multi-part?  How do you
> > distinguish first level of fragmentation from second?
>
> Actually, I think that is easier as it first looks. I specified that a
> message part message is a full syslog message in its own rights. As
> such, you can apply all rules applying to any syslog message to a
> message part message, too (because it is a regular message once it has
> been formed). At least this is the sprit that I had on my mind.
>
> So, when a (message part) message is disassembled (being broken in
> parts), the multi-part message headers must also be disassembled. Then
> an additional multi-part-message handler is added. We end up with a
> message that, after reassembly, becomes the orginal message part
> message, thus in itself a message that must be reassmbled to become the
> original message.
>
> In eventually more familiar terms: let's use "fragmentation" for a
> moment. If a message fragement becomes further fragmented, an additional
> fragmentation header is added and the fragments of this message will
> then travel as a fragemented part of a fragmented message. Double
> fragementation. Obviously, there is quite some overhead.
>
> I consider relays splitting message into multi-part messages as a last
> resort when there is no other way to handle the situation. It is
> definitely NOT desirable.

I'm thinking I would really like to see only one level of syslog message
level fragmentation allowed.  If the transport needs to fragment, let
it.  If the transport can't fragment, let the transport mapping define a
mechanism for fragmentating messages that makes sense.  Basically this
will be a PIA to implement correctly and quickly (run time not
implementation time).   I haven't really the multi-part message spec
very closely.  Will we need to escape structured data elements in a
twice fragmented message? This may be necessary because the reassembly
code might not know which meta-data about message parts belongs to which
level of fragmentation?

>
> > 12. The ID is now 45 pages long and growing with every revision.  I
> > think it would help if we shortened it whenever possible.
> > After all it
> > is just a syslog protocol.  This is protocol used for troubleshooting.
> > It can't be itself overly complicated or give such impression.
>
> Well ... actually it get's larger with each of your well-thought out
> comments ;)
>
> Honestly, I like to keep it short. But just look at this mail. How often
> do you rightly ask if we can specify something more precisely? So how
> can we shrink it and also make it more precise? Well, I assume we can
> save some pages if a good native English editor goes over it. In
> general, however, I think it is expanding. Of course, we can move all
> the specifics and clarifications out into a separate web page, but what
> exactly is the value of this? And who guarantees that in implementor
> will visit these pages?
>
> Of course, the growth of the document is also related to my try to keep
> things right in bounds (see my intro). But again, if you look at the WG
> mailing list archive, you will see lots of comments that "this and that"
> is unclear and needs to be specified. If we do, the spec obviously grows
> ;)
[ snip ]

More precice specification is great, overly percise specification is
not.  Also, some verbiage may be better placed in other documents.  So
implementation advice could go on a web page.  Section 6.3 may be
better in the syslog-sign RFC, etc ...

my $0.01

-- 
Devin Kowatch
[EMAIL PROTECTED]

Re: -protocol-03

Reply via email to