Steve: I am not sure I understand which octet you were talking about. Sorry if I missed earlier discussion.
UTF-8 is based on Unicode. Unicode provides a constant integer for various language/symbol visuals. UTF-8 encodes those constants into variable length byte sequence. ASCII is one byte, other symbols - two or more bytes. In order to display Unicode, you need a viewer which can handle Unicode. You do not need to know locale information to display UTF-8 in Unicode viewer. This was the hole idea behind Unicode instead of the legacy of gazillion locale-specific encodings. If you are suggesting some indication to parser on whether or not the message uses UTF-8 or just strict ASCII subset, then I think that indication is already there. You can determine it based on looking at the first bit of every byte. Basic (non-extended) ASCII does not use it. If the bit is set, you have got extended symbols in your data and need UTF-8 aware parser. As far as passing locale info. If there is consensus that it will be widely used, we can define a standard structured data tag for it with defined semantics. However, making specific tags required on senders would require more discussion. I am not sure all senders will know their locale. So, requiring it would be tough. Thanks, Anton. > -----Original Message----- > From: Steve Chang (schang99) > Sent: Friday, May 27, 2005 5:57 PM > To: Anton Okmianski (aokmians); Alexander Clemm (alex); > [EMAIL PROTECTED]; Rainer Gerhards > Cc: syslog-sec@employees.org > Subject: RE: [Syslog-sec] Syslog protocol - UTF-8 encoding > > Hi, Anton: > > The suggested octet may not seem necessary from sender's perspective. > But as Alex pointed out, the receiving end syslog > server/application can do the decoding easier with the help > of that "encoding type" octet before the structure data and > message body. > > Besides, this octet can be helpful to allow other encoding > not limited to laguages, if needed. And if some specific > value out of the octet is reserved, it can help future > extension for this specification and help ease the extension > related migration issues. > > Regards, > > Steve > > > > -----Original Message----- > > From: [EMAIL PROTECTED] [mailto:syslog-sec- > > [EMAIL PROTECTED] On Behalf Of Anton Okmianski > (aokmians) > > Sent: Friday, May 27, 2005 2:46 PM > > To: Alexander Clemm (alex); [EMAIL PROTECTED]; Rainer Gerhards > > Cc: syslog-sec@employees.org > > Subject: RE: [Syslog-sec] Syslog protocol - UTF-8 encoding > > > > Alex: > > > > We had discussions and proposals to support various locale-specific > > encodings early in the process. We decided against it as > UTF-8 really > > covers representation of all languages. It is also the general > > direction of IETF for various protocols. And the > compatibility with ASCII helps too. > > I think it is a pretty good choice. > > > > Thanks, > > Anton. > > > > > -----Original Message----- > > > From: [EMAIL PROTECTED] > > > [mailto:[EMAIL PROTECTED] On Behalf Of > > > Alexander Clemm (alex) > > > Sent: Friday, May 27, 2005 4:58 PM > > > To: [EMAIL PROTECTED]; Rainer Gerhards > > > Cc: syslog-sec@employees.org > > > Subject: RE: [Syslog-sec] Syslog protocol - UTF-8 encoding > > > > > > Andrew, David, > > > > > > thank you. I was a bit too quick sending out the earlier > message; I > > > was confused. With ASCII being effectively a subset of > UTF-8, issue > > > 1 goes away, and as far as issue 2 is concerned, this > does allay my > > > concerns, at least as far as the sender side is concerned. I am > > > still wondering if for the receiver side it might still > be useful to > > > know what encoding to expect - full UTF-8, or just the > ASCII subset. > > > It would be interesting to hear the perspective of someone on the > > > receiver side, but from my point, my concerns are > addressed. As for > > > other encodings being of interest, while I would not rule > it out I'm > > > not aware of any. > > > > > > Kind regards > > > --- Alex > > > > > > > > > > > > -----Original Message----- > > > From: David B Harrington [mailto:[EMAIL PROTECTED] > > > Sent: Wednesday, May 25, 2005 8:10 PM > > > To: [EMAIL PROTECTED]; Alexander Clemm (alex); 'Rainer Gerhards' > > > Cc: syslog-sec@employees.org > > > Subject: RE: [Syslog-sec] Syslog protocol - UTF-8 encoding > > > > > > Hi, > > > > > > In reading my response, it seeems a bit too succinct. > > > > > > The relevant text from STD63 is: > > > "UTF-8, the object of this memo, has a one-octet encoding > unit. It > > > uses all bits of an octet, but has the quality of > preserving the > > > full > > > US-ASCII [US-ASCII] range: US-ASCII characters are > encoded in one > > > octet having the normal US-ASCII value, and any octet > with such a > > > value can only stand for a US-ASCII character, and > nothing else." > > > > > > Hope this allays your concerns. > > > > > > David Harrington > > > [EMAIL PROTECTED] > > > > > > > -----Original Message----- > > > > From: [EMAIL PROTECTED] > > > > [mailto:[EMAIL PROTECTED] On Behalf > Of David B > > > > Harrington > > > > Sent: Wednesday, May 25, 2005 10:58 PM > > > > To: 'Alexander Clemm (alex)'; 'Rainer Gerhards' > > > > Cc: syslog-sec@employees.org > > > > Subject: RE: [Syslog-sec] Syslog protocol - UTF-8 encoding > > > > > > > > Hi, > > > > > > > > According to STD63, UTF-8 has the characteristic of > preserving the > > > > full US-ASCII range. > > > > > > > > David Harrington > > > > [EMAIL PROTECTED] > > > > > > > > > > > > > -----Original Message----- > > > > > From: [EMAIL PROTECTED] > > > > > [mailto:[EMAIL PROTECTED] On Behalf > > > Of Alexander > > > > > > > > Clemm (alex) > > > > > Sent: Wednesday, May 25, 2005 8:56 PM > > > > > To: Rainer Gerhards > > > > > Cc: syslog-sec@employees.org > > > > > Subject: [Syslog-sec] Syslog protocol - UTF-8 encoding > > > > > > > > > > > > > > > Hi, > > > > > > > > > > 2 questions/ suggestions concerning the UTF-8 encoding in the > > > syslog > > > > > protocol: > > > > > > > > > > 1) Is the " " (white space) after the header to be encoded in > > > ASCII > > > > or > > > > > UTF-8? The spec seems currently open to that respect > > > (although it > > > > > would seem logical for it to be still in ASCII); should be > > > > > clarified. > > > > > > > > > > 2) Concerning the UTF-8 encoding, depending on > where you send > > > > syslog > > > > > messages there are many scenarios in which it would be > > > > > beneficial > > > to > > > > > have an option in which NOT to use UTF-8 encoding but to > > > also allow > > > > > for other encodings, in particular plain ASCII. Such > an option > > > > > would > > > > also > > > > > allow for quicker adaptation of this specification, as it is > > > > > eases > > > > the > > > > > migration. To provide for that, it seems it would > make sense to > > > > allow > > > > > for a flag in the header part of the message - at the > > > tail end (that > > > > > > > > is known to be still ASCII encoded), right before the > structured > > > > > data, that indicates which encoding is used - that is, > > > whether UTF-8 > > > > > > > > is in effect, or if another encoding is used - ex. ASCII, or > > > > > even proprietary. > > > > > > > > > > > > > (Apologies in case this aspect was discussed in the > past and I > > > > > am beating on a dead horse; but this appears important > > > enough to bring > > > > > up.) > > > > > > > > > > > > > > > --- Alex > > > > > _______________________________________________ > > > > > Syslog-sec mailing list > > > > > Syslog-sec@www.employees.org > > > > > http://www.employees.org/mailman/listinfo/syslog-sec > > > > > > > > > > > > > > > > > _______________________________________________ > > > > Syslog-sec mailing list > > > > Syslog-sec@www.employees.org > > > > http://www.employees.org/mailman/listinfo/syslog-sec > > > > > > > _______________________________________________ > > > Syslog-sec mailing list > > > Syslog-sec@www.employees.org > > > http://www.employees.org/mailman/listinfo/syslog-sec > > > > > _______________________________________________ > > Syslog-sec mailing list > > Syslog-sec@www.employees.org > > http://www.employees.org/mailman/listinfo/syslog-sec > _______________________________________________ Syslog-sec mailing list Syslog-sec@www.employees.org http://www.employees.org/mailman/listinfo/syslog-sec