RE: some (well a lot ) review remarks

Anton Okmianski Thu, 01 Apr 2004 15:23:02 -0800

Albert, Rainer, WG:

I have not thought through all the details, but...


I share Albert's concern about defining message length limits and
mandating partitioning in syslog-protocol given that it is supposed to
be transport independent.  I'd prefer if syslog-protocol was just
about the format and encoding of the message payload. Payload being a
complete syslog message without any transport-specific parts.

As it stands now, syslog-transport is really about transferring syslog
message segments and not complete syslog messages.  If
syslog-transport does all segmentation, then it is easier to specify
how to avoid issues such as double-segmentation (syslog-protocol level
and then IP fragmentation).

The relay provisions should probably also be specified only as far as
payload is concerned. For example in DHCP, when message goes through
relay, relay agent is allowed to insert an option specifying its
address as well as giaddr (the network from which request for IP
originated from).  So, it just specifies additional provisions for
payload for cases with relay agents and that's it. I think it will be
easiest if we allowed relay agent to function like any
syslog-transport client and re-assemble the whole message before doing
anything with it. In other words, it should work with syslog messages,
not syslog message segments.

Just my 2 cents.

Anton.

> -----Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of
> ons-huis.net!ALbert
> Sent: Tuesday, March 30, 2004 11:59 AM
> To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
> Subject: some (well a lot ) review remarks
>
>
> Hello Rainer, Anton, WG
>
> Last weeks I have reviewed some of the current draft
> documents. I had planned to spend more time on it, coming
> month. But, plans are changed. It will be very busy, which is
> good .. But also means less tome for syslog.
>
>
> I have one important "issue": I think it would be a great
> improvement if some parts of Rainer's document would move to
> Anton's. I think the protocol document should NOT specify how
> to break messages in parts. It should assume the transport
> layer can "transport" 'long messages'. The transport layer,
> should when a long message can't be send in once, spilt in is
> several parts.
>
> I think it should be possible to write another 'transport'
> document, which describes how 'long, new, better' messages
> can be transported by rfc-3164 syslog messages.
> Note: I'm NOT saying we should make an RFC for this. Only
> that, if we could write it, the separation between transport
> and  "upper" layers are better. Note 2: That document should
> describe how to 'shrink' the longer headers into the old
> headers, etc. Not by putting the complete message into the payload.
>
> --------------
> Aside from the point above, I will include a raider long list of
> (personal) notes 'asis'.
>
> Rainer, please use those notes when relevant, Skip the
> others. I'm sorry, I don't have more time to rewrite my notes
> to a good review. It is either me to throw away them, of give
> another that option.:-)
>
> Hope it helps a bit.
>
> ===========================================================
>
> --
> Groetjes,
> --ALbert Mietus
> Private mail to:           albert at ons-huis dot net
> Business mail to:     albert dot mietus at PTS dot nl
> Spam:   Just don't do it! Thrust me, I will not order!
>
> Hello WG,
>
> This is my comment on Rainer's syslog-protcol draft 04. I
> have been very passive on following this mailinglist; (to
> busy:-) But on receiving a request to review I decided to do
> so. But I apologise if I raise issues that are discussed
> already. I didn't read them. This week I started, after a log
> time, to study the current draft document(s).
>
> Before continuing, let me say I like the idea of splitting
> syslog in several layers. I say so, in the hope the
> individual sub-specifications will become short. My comment
> is split into 3 parts, of which you are reading part 1 now.
> Part 1 is about the "idea" (the design). In a separate mail I
> will comment on the text of the document; in the hope I will
> become clearer, cleaner and shorter. The 3rd mail will
> contain some details and bit that didn't fit in the others.
>
> Possible, I will send more comment in futures mail; it will
> depend on the amount of time I have now, and then.
>
> -------------------------------------------------------
>
> This review is becoming (very) long, I really hope all
> comments are worth reading. I have tried to express my
> thoughts about is a well as possible, given the time I have...
>
> General Comment on the idea/design of syslog-protocol
> =====================================================
>
> * This idea is GOOD!
>
>
>
> H2 Architecture
> ================
>
> Traditional, syslog knows Devices, Collectors and Relays.
> I would like to add two 'things'. One I would like to call a
> *Generator* , the other an *Runner*.
>
> *Generator*
> As we can see in e.g. most Unix implementations, it is the
> application that knows WHAT to log, transmit that to the
> system (the log device, the syslogdaemon) that know HOW to
> log it. Also on embedded systems, this can be seen.
> Historically, the combination of (a part of the) application
> and the "system" form the log-device. I would like to split
> this, now this opportunity exits. The part that is build-in
> in the application, (in C: the lines syslog(..."hai there");)
> can be called  the (LOG-)Generator. The communication between
> generator and Device is system/platform depended. On Unix
> systems usually the log-device, on embedded systems a
> function-call, and on windows log-events can be used.
>
> By introducing this Generator, we (can) make clear this
> private/dedicated communication exist; and is allowed. We
> also make clear the Generator is syslog-protocol INDEPENDED.
>
> The function of the Device, know becomes clear: it get
> log-message (-events) and transport them. It also does some
> bookkeeping, like timestamping, adding crypto (for -sign), etc.
>
> *Runner*
> A LOG-runner is a other kind of syslog-thing, which is
> frequently used. Without a proper place in the architecture.
> Whereas a relay (should) forwards syslog-messages without
> knowledge of the semantics of the message, the Runner does.
> The most simple life-form of a runner is a filter. It
> "relays" messages, but only when the are important. On Unix
> there a several of these in perl, grep-scripts  etc.
> Formally, the are knot relays (I think). A more complex
> runner, is a "program" that receives log-messages, CHANGES
> them, and send (or stores) the result. Examples: statically
> analyses, Intrusion detection, etc.
>
> Both kind of Runners are useful, frequently used, but the not
> part of the architecture. And as we try to make syslog
> "better", we better add them and make sure out standards can
> "deal" with them. Otherwise, non-RFC compliance log-programs
> will be standard.
>
>
> H4 Syslog format
> ================
>
> 412 enterpriseID
> -----------------
>
> I don't see any reason to include an enterpriseID; not into
> the header. (When needed, it can be used in the structured MSG part)
>
> Currently, it is just a number. It will be unused, misused or
> will lead to a lot of (operational) management. I'm afraid
> for the latter, as in H4.1.3 is suggested  that the semantic
> of the Facility can be enterpriceID depended.
>
> Also, it is required to use the "IANA assigned vendor"
> number. This implies open-source/free/non-commercial are
> 'ruled out' as the often will not to so.
>
> Last, should the number of the Generator-vendor, the
> system-vendor of the device-vendor be used? (See above about
> generator/device) This is not clear to me. And whatever one
> is chosen, it will be hard to implement. Not using the
> defacto (Unix) logging api's!
>
> 413 Facility
> -------------
>
> Although, at first sight, liked the idea of "a terrible lot"
> of facilities. The current <used a number idea> is wrong, as
> I see it. Aside from the problems mentioned above, more then
> a million facilities will mean relays can't be managed! The
> set of facilities, which will be seen in a (major) network,
> expressed as numbers, will be are more or less at random.
> Which implies very long complex and unmanageable configfiles
> (or MIB's) for each LOG-router!
>
> As we heave learn form routing IPv4, hierarchical structuring
> is needed.
>
> I think, extending the set of facilities is good. But I can't
> imagine more the say 1000 are ever needed.
>
> So, my counter-proposition is:
>         *  Make  facilities (as a number) structured
>         *  Limit the number of facilities to a manageable number
>         *  Keep the format such that extending the allowed
> numbers is possible
>
> A Facility then is still a number, at least 3 (or 4) digits
> long. A longer number means the it is an extended facility.
> They have to be assign by IANA. Facilities of length 3 (4)
> MUST have the format '(K)KLM', where 'K' (or 'KK') indicated
> the kind of facility; 'L' give a sub indication and 'M' is
> *SITE* configurable (so, by the local sysadmin, see example
> below). The  'K/KK' is based on the RFC3164 facilities, clean
> up and extended. Those numbers can be IANA assigned. L can be
> chosen by the (generator) vendor, and `M' by the admin. 'M'
> defaults to 0 (zero), and  applications/vendors MAY give the
> possibility to set that digit.
>
> Example: For mail, there will be an K specified, let say 1.
> Then all mail-log will have the format '1XY', which is easily
> routable. It will do for small sites. Some vendors, like
> "sendmail" (only 1 process) will probably use only one value
> for L e.g. '0'; others, like "postfix" (several processes)
> can use multiple values, like '1', '2' and '3'. When
> supported by both sendmail and postfix, the local sysadmin
> can add (change) the M-digit, such that mail-systems on the
> border, and internal ones use another facility.
>
> In all cases, the local sysadmin can either use simple
> routing-rules, like 1** (for all mail), or 10* for  sendmail
> and 1[1-2]*, 13* for postfix, or even more complex. Now, the
> sysadmin has a choice, and can keep it manageable.
>
> Note: the 0 for sendmail and 1-4 for postfix are "by
> example". However, we can add a "rule" that '0' shall be used
> when only 1 L-value is used, and when several values are
> used, zero should be skipped. Also, I would like to
> prescribe/reserve "9" for local additions. (on all digits).
>
> The (K)KLM idea is used a suggestion to improve, IF this idea
> is accepted, THEN we can discuss variations like 3 or 4
> digits, where to save mappings (IANA of this rfc), etc.
>
> 5 Structured data
> ==================
>
> In short: I thing having the option of structured data is a
> nice option. But lets keep it simple.
>
> The current one is to complex, it has to be as we need 4
> pages to describe it. Also, I find those pages hard to study
> (given the time I had :-). Also it can lead to not
> implementing it. Programmers, especially there bosses, don't
> have a lot of time!
>
> More positive: The main reason why structured data is complex
> (currently) comes down into 1 problems:
> 1) It isn't part of the main-design (the ABNF on page 9)
> 2) The "structure" can start anywhere in MSG
>
> Both are easy to solve:
> Ad 2) Specify that the structured part ALWAYS START directly
> after the header. Ad 1) We need to introduce it where it
> belongs. In an optional field on page 9
>
>
> Let give it a try ( I also use the "improvement" on the ABNF
> of my other mail; it saves typing) (Also I "forget" the SP
> parts, for now. Just the idea)
>
> SYSLOG-MSG  = HEADER DATA
> HEADER      = VERSIONING PRIO ID         // See other mail
> DATA        = [ *STR-DATA ] MSG
> STR-DATA    = see below
> MSG         = free format
>
> Given this ABNF, the structured data ALWAYS comes (in RFC3164
> notation) at the start of the MSG-part (of in new ABNF:
> before the free format MSG).
>
> This implies receivers always have (as last resort) the
> option to see everything after the header as free-format. And
> just store/forward it. It implies the start of STR-DATA is
> simple to find: Its starts directly after the header, or
> directly after another STR-DATA
>
> This implies to, we have the option to start STR-DATA with
> '<' which is more usable and XML-alike. The complex long
> "[EMAIL PROTECTED]" cookie isn't needed anymore. However, we free to use
> it. My personal vote is for the (XML) < > style.
>
> See also my other posting about details of structured-data.
>
> 6 Multi-Part Messages
> =====================
>
> There are some mistakes in the this part, but I like the
> general idea. However, I fell spliting/reassembly is done in
> other protocols too. Maybe we can use/reference a (de facto)
> standard? I don't know an RFC which we can use, but I'm sure
> there must be one!
>
> Second, it to complex, and to long (to read). I have studied
> it, but I'm not sure I do understand it.  Some details about
> which I find hard/wrong/dislike
>
> MP-timestamp
> ------------
> I do not like having several messages having the same TIMESTAMP
>
>
>
> 62 SD-ID receiving an optional STR-DATA
> =======================================
> This must be a mistake!
> In the 3rd paragraph is stated the a receiver sometimes MUST
> NOT parse a STR-DATA of a log-message that is received.
> However, when the option Multi-part is not implemented, is
> doesn't now this!
>
>
>
> Hello WG,
>
> This is part 2 (of 3) of my comment on the 3th
> draft-syslog-protocol. See the introduction at my posting
> [EMAIL PROTECTED] This one contains comment on the text of the document;
> to help to clarify, and shorten the document. It does NOT
> contain comment on the "idea" (design); se posting [EMAIL PROTECTED]
>
> -------------------------------------------------------
>
>
>
> Implementation hints
> ====================
>
> The current RFC contains a lot of valuable hints for
> programmers, like the one about time-secfrac (Yes I
> introduced it, at least the bug:-)!
>
> Currently the are scattered around the document, making the
> document long and more complex to read for non-programmers.
>
> I would suggest to move all of them to a new chapter, at the
> end of the document (after the current chapter 9).
>
>
> ABNF (Chapter 4)
> ================
>
> I think we can clarify the syntax, by "unflattening" RGC-3164
> syslog uses some field and subfields which are nice when
> introducing syslog to others. The "understand" names as priority.
>
> So, keep it structured. I give it a try (please correct the
> syntax of the ABNF, as I not writing it daily anymore)
>
> SYSLOG-MSG  = HEADER DATA
> HEADER      = VERSIONING PRIO ID
> DATA        = [ STR-DATA ] MSG
> VERSIONING  = "V" VERSION
> VERSION     = 1*3 DIGIT
> PRIO        = '<' FACILITY '.' SEVERITY '>'      // See [EMAIL PROTECTED]
> for notation change
> ID          = TIMESTAMP SP HOSTNAME SP TAG
> STR-DATA    = see elsewhere
> MSG         = free format
> (etc)
>
> Now we have meaningful field, that can be used to. E.g the ID
> field (see other mail) make each message unique
>
> 5.1 Format (typo?)
> ==================
> On page 20, the paragraph starting with "The structured data
> element MUST ..." is confusing. The 2nd last line say _no
> space_ is allowed, the last one says one or more space are.
> Is this a typo? Or it is unclear (at least to me)
>
> Structures data, ID-length
> ==========================
>
> I don't see why we should limit the several field to 64
> positions. I agree this will normally suffice. But so will
> 32, or 16, of any other number. 64 is "to big" to use as a
> fix-sized ("reserved") space for programmer's, database
> fields, etc. (to big == spoil to much bytes on huge logs). So
> dynamic field need to be used anyhow. Then there is no need
> for a trivial maximum. Note, there is a maximum anyhow, given
> by the size of a single log-message. That will do for "short
> term allocation".
>
> Removing this limit, make the rdc cleaner and smaller.
>
> Structured data, spaces
> =======================
>
> I would like to have all line about SP (spaces) in chapter 5
> removed. The point about 0,1 or more spaces is not relevant.
> In general, syslog uses SP to separate field (when needed).
> And allows them in MSG-part. The syntax and semantics of
> STR-DATA is does not depend on the amount of space. Nor is it
> harder to read. Implementing receivers even become easier
> when spaces (in STR-DATA) can be skipped (while { if 'sp'
> then skip } ) instead of checking the correct number, and
> doing something if wrong !
>
> Proposal: Allow spaces anywhere, but inside SD-ID (see note)
> and SD-PARAM. SP in SD-VALUE is allowed (already), but not a
> separator. Prescribe (at least 1) SP between each param-value pair.
>
> *Note: as SP between '[#@'(or '<') and the SP-ID itself is
> probably not a good idea, but not a problem. We can fix is,
> by moving the fixed string in the ABNF:
> STR-DATA    = STR-START ... STR-END
> STR-START   = "[#@" SD-ID   ; or '<'
> STR-END     = "]"           ; or '>'
> ...         = as before, SP are allowed.
>
> Note2: Doing so allows for format line for human reading,
> which is handy
> Example     <x-gam-example  doYoe="like this"  or     = "This
> one?"   >
>             <z-gam-more     Yes  = "I do"      find   = "it
> readable!" >
>
> This example is simple to parse, both for humans and
> computers. This change will make chapter-5 shorter, I think!
>
> Last, I think any whitespace should be allowed instead of SP (eg.
TAB)
>
> Chapter-5, MSG
> ==============
>
> I suggest, but only as a detail, the text of chapter-5 should
> be part of 4.2
>
>
>
> Hello WG,
>
> This is part 3 (of 3) of my comment on the 3th
> draft-syslog-protocol. It only contains short remark on
> details, and bit that didn't fit in part 1 (idea/design) or
> part 2 (the document itself). See posting [EMAIL PROTECTED] for more introduction
>
> -------------------------------------------------------
>
> 413/314 FACILITY/SEVERITY
> ==========================
>
> In this draft, both facility and severity are numbers. Even
> with my suggestion, the are 'just numbers'. And numbers are
> hard to read for humans. Especially when the are a lot of
> them. Most people will forget which column contains which number.
>
> Therefore, I think it is better to use the (verbose)notation
> used by most syslog implementations: "'<' facility '.'
> severity '>'" Both facility and severity are numbers (at
> least in the wire). Collectors (viewers) can translate those
> numbers into there names. But still use this format. And Even
> without them, it is easier to read the first (new) then the
> second (current) line: V1 0 <888.4> 2003-010-11T22:14:13.003Z
> new.Formated. ... V1 0 888 4 2003-010-11T22:14:13.003Z
> old.Formated. ...
>
> Note: I agree, we should not the tricky "8 times F plus S"
> notation. I'm not suggesting that! Just insert a dot and the angles.
>
>
> 4151 timestamp, without time
> ============================
>
> There are 'devices' as meant in this section which  haven't
> an idea of TIME. So it is a good idea this section.
>
> Often, those devices can store ("know") a few bit of
> information. Therefore, I would like to change this fixed
> TIMESTAMP, to the same one, but with a sequence number
> attached; the factional-seconds field can be used for it.
>
> Then a timestamp becomes 2001-01-01T00:00:60.<seq>Z.
> As in the current draft, this time doesn't exist. But al
> least, collectors can (more or less) sort a set of
> logmessages form 1 device
>
> Note: the latter is needed for e.g. syslog-sign
>
>
> 417 TAG
> =======
>
> I think we need to make the TAG stuctured! All current syslog
> receivers (collectors, relays) use __PARTS of__ the RFC3164
> TAG to route messages. In RFC3164 the TAG is simple and
> short, so it is quite simple to use it for routing. Note: not
> the complete TAG is used, only the 'program name', never the PID
part.
>
> With the new TAG, with an static ID an a dynamic part,
> similar routing should be possible. At least, the RFC should
> be clear on it. So, by demanding the static part is "fixed",
> and make sure that static part can be found.
>
> Given the current practice, routing is based (mainly) on the
> program-name, it would be wise to (at least suggest) how
> (where) that part can be found.
>
> Proposal:  Forget native support for VMS/Windows/DOS and even
> Unix pathnames. And introduce an URI (URL)-alike schema.
> Where only '/' (not the one form Unix, but from URL's) is
> used to "path separation" and ':' and '//' are major separators.
>
> In this case, the ABNF is simple; only the semantics become a
> little more complex. Also, it become simple for
> web-applications to log. The have an URL already. All "old
> fashioned:-)" application have an URL:
> ''file://path/to/appl'' already. This is valid on any system.
>
> Note: the dynamic part has to be added
> Note2: for web-applictions, which include a 'hostpart' in the
> URL: that hostname is NOT the same as the HOSTPART in the
> header. Frequently (a web-farm) several systems share the
> same URL, but not the hostname. Then the sysadmin can decide
> which one to use for routing.
>
>
> Message ID
> ==========
>
> Given the architecture of syslog-networks, messages can be
> duplicated. But, sometimes messages are related (e.g. with
> the signatures of syslog-sign). Both the RFC3164 and the
> current -protocol draft do not have possibilities to unique
> identify a message.
>
> In practice, messages are unique by there hostname, TAG and
> timestamp. But, we can't trust on this, as it isn't required.
>
> I would like to introduce this requirement. It is simple to
> add to the RFC, and simple to implement. Given the HOSTNAME
> and TAG, all the implementers have to do is never send a
> message with the same timestamp. Given the microsecond
> resolution, this doable. (It does imply some systems have to
> fake the last digits; I don't see a problem with this.
> Otherwise we can add a "." SequenceNo to the timestamp
>
> Structured data, tokens
> =======================
>
> Given the current draft, of my counter proposal of it, of
> structured data, the are (only) 2 kinds of tokens. The IANA
> controlled ones and the experimental ones; the latter
> starting with "x-"
>
> I think, we can safely add an third: "X-" for
> private/local/vendor specific tokens. As we can see my e.g.
> mail, this kind of field will be used a lot. Now we have an
> option to allow the, without giving the a status of "testing".
>
>
> STR-DATA, can we use it for syslog-sign (or similar)?
> =====================================================
>
> Just an idea: in a protocol as syslog-sign (here just as
> example), where messages reference to other messages (now
> implicit), we could use the STR-DATA to do so?
>
> I verified the syntax/semantics of this, and YES, we could do
> so. This means, I like STR-DATA a lot more :-) It is great.
> Even when -sign doesn't use it, I (we) can use the same
> format to present it to the user!
>
> 64 MultiPart examples
> =====================
>
> All examples use rfc3164 headers, shouldn't -protocol headers be
used?
>
>

RE: some (well a lot ) review remarks

Reply via email to