I should make the disclaimer that I haven't read the latest draft, sorry Rainer, as time has not been available.
On Thu, Feb 19, 2004 at 05:55:46PM +0100, Rainer Gerhards wrote: [ snip ] > > > 2. You talk a lot about server parsing the syslog message. > > Are we making > > an assumption that receiver must parse messages (for example 5th > > paragraph in 4.1.1)? > > I think you make a point here that I missed receivers that are not > interested at all in parsing a message. However, this case should be > very, very seldom because without parsing, there is nothing else that > you can do to the message - because you have just a bunch of octets > without semantics. OK, you can use this to write a raw log, but that's > it. To assign any meaningfullness to the message, you MUST parse it - no > way around it. My experience is on the unix side of things, so the following may not be true, or as true, for other platforms. I'm wondering what you mean by "meaningfullness". Having worked on the odd syslog analyser or two, syslog messages are nearly impossible to extract meaning from. To clarify, because the actual meaning of a syslog message (e.g. su to root failed) is in free form text (and there is nothing that we can do about that) extracting the meaning involves either matching the text to known patterns or natural language processing. Everything other than the message is really meta-data (the time it was logged, where it come from, path information, etc). The syslog server really needs to know very little of the meta-data to do it's job. Syslog servers have existed for years with minimal parsing of the message. My own syslog server does parse the message, but I think (in retrospect) that may have been a mistake. The only useful place where parsing must occure is when you are translating from one format to another (e.g. classic syslog to SYSLOG/COOKED). BTW, Reading section 4.1.1, I think you may be too specific about what a server MUST/SHOULD (NOT) do with the version string. I think it might be enough to state that this document describes version 2, and an unknown version should cause the reciever to log a message. The last 3 paragraphs seem unecessary. Also, I think they specify too much about how the protocol (and application that uses it) should be written. > > I can add a section for non-parsing receivers, but that won't reduce the > size of the draft ;) I am not sure if it pays to specify this very > uncommon case. > > > Generally, there seems to be a lot of stuff > > dictating behavior for receiver. Why can't we just say: this > > is valid on > > the wire, this is not and leave it up to implementation to decide what > > it does with it. > > See my intro ;) > > [ snip ] > > > > 4. Section 4.1.3. "Any implementation MUST support free configuration > > of the FACILITY on the sender." I think by implementation you are > > always assuming a dedicated sender or receiver library > > product. I don't > > see why I can't just implement sending logic in my app > > directly and not > > have a fixed facility. I think at best, this is a SHOULD. > > I agree that a SHOULD is more appropriate here, we can not actually > enforce this. It's neither a protocol nor a security issue if it is not > configurable. It can be a big backdraw for the operator, but they can > choose to dictate this by not purchasing products which don't support > free configuration. Agree, it MUST be a SHOULD ;) Also note that on the unix side of the house, facilities are defined by in a header file. Furthermore, the actual library calls which log messages will be sending the classic format, probably for years to come, which combines the facility and priority. > > But I think it is a strong SHOULD, not a weak one. Because I have seen > soooo much troubles in the real world out of the inability of some > products to provide free configuration. > > > > > 5. Section 4.1.6 - Hostname. So, we specify FQDN and if to > > present IP. > > Not sure if we had a discussion on this, but did we decide to bypass > > hostname? > > Yes, because it is not meaningful in most cases. I think this is the > best pointer to previous discussion: > http://www.syslog.cc/ietf/autoarc/msg00715.html. Yes and no. Syslog is usually only used within an organization. Which means that in many cases the hostname is enough information to find the actual hardware that sent the message. For larger organizations who use sub-domains and other more complicated setups, the FQDN is more important. > > > I think it will be a common case where hostname is present, > > but machine does not know its domain suffix. I would > > generally prefer IP > > unless it is dynamic (DHCP). > > OK, I see the point. I can make an argument for cases where the hostname is more useful than even an static IP. For example, think of a multihomed host. Or think of a web farm which has many (10+) virtual interfaces for apache daemons. However, leaving this as a SHOULD is good because then implementations can simply make it configurable. > > Probably it is best to ask the sender to provide > a) FQDN > b) static IP > c) hostname only (if on dynamic ip) > d) dynamic IP > e) oops... what if it knows nothing? Well, it should, so this is a > no-case. Right? Or is it worth another paragraph (like puting > "DumbDevice" or "127.0.0.1" into the hostname)? > > Of course, this is a sequence of SHOULDs, not MUSTs. No, e is not needed. I think that it's reasonable to assume that a) any host sending syslog will have an IP address and b) all platforms allow an application to get that IP address. [ snip ] > > > > > 8. Section 6.2.3. I don't think you explain the purpose for allowing > > partcount to grow. I assume this is for streaming. Needs to be > > explained. I also think it is a strange scheme. Why don't you allow > > incrementing it by one every time? [ snip ] I'm just going to take this opportunity to throw out a few general comments I have about the fragmentation issue. Maybe it's just me, but it seems like the protocol is getting more complex than needed. I understand, and agree with, the need to send multi-part syslog messages in the event that the message exceeds the maximum message size. However, I don't think that there is a lot of advantage to some of the directions that this is going in. I'm not trying to pick at a specific issue here, but more to urge that everyone take a step back and see how they can simplify the protocol. Remember the saying about when software is done: It's not done until there are no more things that you can _remove_. I also don't mean this to be diminishing the hard work that everyone has put into the syslog protocol so far. Often when I design software I go through the same process, with the first draft overly complex and simplifing it later. Ultimatly, the protocol will be stronger for process (more cases considered). On the fragmentation issue specifically, do we really need to fragment packets when they would be larger than the MTU of the transport protocol? For -protocol, which I thought was transport independent, it makes no sense. For the UDP transport, why not just frame the message and let the IP layer take care of MTU fragmentation. The other aspect to consider, UDP is not the only transport (please repeat that after me). Not only is UDP not the only transport, but it is not even a good transport (too lossy, yes even with low messages/sec.). Oddly enough it turns out to be slower than TCP (according to someone else's tests) becaue the lossage causes the messages/sec to level off well before the bandwidth is consumed. An example would be syslog over BEEP (similar to SYSLOG/RAW). There is no good reason to restrict the maximum message size, there is no need to frame the message, and there is no need to handle fragments in the message. All of that is handled at the BEEP layer, as is authentication of sender/reciever, and more. It might be a useful exercise for someone to revisit SYSLOG/RAW and adapt -protocol to run over BEEP. It may provide some insights as to how many aspects of -protocol are designed to get around UDPs limitations. These are intended to be examples, I may have more specific comments on this once I read the drafts (this weekend hopefully). But in short, I urge everyone to think about how to _simply_ the protocol. A overly complex protocol gets less takers on implementation. Then again, maybe I'm wrong and all the features do need to be present. > > > > 9. Section 7.2. Can we use "yes" or "no" instead of "0" and "1"? > > I, too, thought about this. In respect to the message size limitation, I > decided to go for the shorter form. "0" and "1" should be fairly clear > as boolean indicator. But I will gladly change this if that is the > concensus on this list. Might I suggest 'T' or 'F' (true or false) or 'Y' or 'N'? just a thought. > > > > > 10. Section 8.1. If relay can't add structured data elements, it can't > > record source IP of the message. I think we should not lose such > > information. Also, need to allow for recording of time or original > > reception. > > That's a tough issue. It will cause us the loss of digital signatures? > Which goal has the higher priority? Or should we work around this issue? > Of course it's doable, but only on the expense of growing the spec. We > would "simply" need to define a way that a relay can add a container > structured data element, which in turn could contain other structured > data elements - and that are clearly flagged as being not part of the > original message so that a signature verifier could remove that part. > This also sounds like a call, again, for XML ;) > > It's doable, but it will add considerable complexity. > > On the other hand, we can allow a relay to break signatures. I am not > sure if that is a good mode. > > Or we do not allow it to modify the message (as currently specified), > but then we are not able to save the information that you request - and > I agree this information is valuable... This should be easy, but you need to encapsulate. The realy MUST NOT alter the message it recived in any way (for signatures). But, to track the path of the message you could encapsulate the message recieved by the relay (M) in a new message that contains PRIORITY RELAY-ADDED-HEADERS M. Note that the priority is duplicated because it is most often used to route the message. Simple enough, no? Now the real question is: would it work? :) > > > > > 11. Section 8.3. You allow relay to break message into multiple parts. > > What happens with a message that is already multi-part? How do you > > distinguish first level of fragmentation from second? > > Actually, I think that is easier as it first looks. I specified that a > message part message is a full syslog message in its own rights. As > such, you can apply all rules applying to any syslog message to a > message part message, too (because it is a regular message once it has > been formed). At least this is the sprit that I had on my mind. > > So, when a (message part) message is disassembled (being broken in > parts), the multi-part message headers must also be disassembled. Then > an additional multi-part-message handler is added. We end up with a > message that, after reassembly, becomes the orginal message part > message, thus in itself a message that must be reassmbled to become the > original message. > > In eventually more familiar terms: let's use "fragmentation" for a > moment. If a message fragement becomes further fragmented, an additional > fragmentation header is added and the fragments of this message will > then travel as a fragemented part of a fragmented message. Double > fragementation. Obviously, there is quite some overhead. > > I consider relays splitting message into multi-part messages as a last > resort when there is no other way to handle the situation. It is > definitely NOT desirable. I'm thinking I would really like to see only one level of syslog message level fragmentation allowed. If the transport needs to fragment, let it. If the transport can't fragment, let the transport mapping define a mechanism for fragmentating messages that makes sense. Basically this will be a PIA to implement correctly and quickly (run time not implementation time). I haven't really the multi-part message spec very closely. Will we need to escape structured data elements in a twice fragmented message? This may be necessary because the reassembly code might not know which meta-data about message parts belongs to which level of fragmentation? > > > 12. The ID is now 45 pages long and growing with every revision. I > > think it would help if we shortened it whenever possible. > > After all it > > is just a syslog protocol. This is protocol used for troubleshooting. > > It can't be itself overly complicated or give such impression. > > Well ... actually it get's larger with each of your well-thought out > comments ;) > > Honestly, I like to keep it short. But just look at this mail. How often > do you rightly ask if we can specify something more precisely? So how > can we shrink it and also make it more precise? Well, I assume we can > save some pages if a good native English editor goes over it. In > general, however, I think it is expanding. Of course, we can move all > the specifics and clarifications out into a separate web page, but what > exactly is the value of this? And who guarantees that in implementor > will visit these pages? > > Of course, the growth of the document is also related to my try to keep > things right in bounds (see my intro). But again, if you look at the WG > mailing list archive, you will see lots of comments that "this and that" > is unclear and needs to be specified. If we do, the spec obviously grows > ;) [ snip ] More precice specification is great, overly percise specification is not. Also, some verbiage may be better placed in other documents. So implementation advice could go on a web page. Section 6.3 may be better in the syslog-sign RFC, etc ... my $0.01 -- Devin Kowatch [EMAIL PROTECTED]