Steve:

I am not sure I understand which octet you were talking about.  Sorry if
I missed earlier discussion.

UTF-8 is based on Unicode.  Unicode provides a constant integer for
various language/symbol visuals.  UTF-8 encodes those constants into
variable length byte sequence.  ASCII is one byte, other symbols - two
or more bytes.  

In order to display Unicode, you need a viewer which can handle Unicode.
You do not need to know locale information to display UTF-8 in Unicode
viewer.  This was the hole idea behind Unicode instead of the legacy of
gazillion locale-specific encodings. 

If you are suggesting some indication to parser on whether or not the
message uses UTF-8 or just strict ASCII subset, then I think that
indication is already there. You can determine it based on looking at
the first bit of every byte. Basic (non-extended) ASCII does not use it.
If the bit is set, you have got extended symbols in your data and need
UTF-8 aware parser.  

As far as passing locale info. If there is consensus that it will be
widely used, we can define a standard structured data tag for it with
defined semantics.  However, making specific tags required on senders
would require more discussion. I am not sure all senders will know their
locale. So, requiring it would be tough. 

Thanks,
Anton. 

> -----Original Message-----
> From: Steve Chang (schang99) 
> Sent: Friday, May 27, 2005 5:57 PM
> To: Anton Okmianski (aokmians); Alexander Clemm (alex); 
> [EMAIL PROTECTED]; Rainer Gerhards
> Cc: syslog-sec@employees.org
> Subject: RE: [Syslog-sec] Syslog protocol - UTF-8 encoding
> 
> Hi, Anton:
> 
> The suggested octet may not seem necessary from sender's perspective.
> But as Alex pointed out, the receiving end syslog 
> server/application can do the decoding easier with the help 
> of that "encoding type" octet before the structure data and 
> message body.
> 
> Besides, this octet can be helpful to allow other encoding 
> not limited to laguages, if needed.  And if some specific 
> value out of the octet is reserved, it can help future 
> extension for this specification and help ease the extension 
> related migration issues.
> 
> Regards,
> 
> Steve
> 
> 
> > -----Original Message-----
> > From: [EMAIL PROTECTED] [mailto:syslog-sec- 
> > [EMAIL PROTECTED] On Behalf Of Anton Okmianski 
> (aokmians)
> > Sent: Friday, May 27, 2005 2:46 PM
> > To: Alexander Clemm (alex); [EMAIL PROTECTED]; Rainer Gerhards
> > Cc: syslog-sec@employees.org
> > Subject: RE: [Syslog-sec] Syslog protocol - UTF-8 encoding
> > 
> > Alex:
> > 
> > We had discussions and proposals to support various locale-specific 
> > encodings early in the process.  We decided against it as 
> UTF-8 really 
> > covers representation of all languages.  It is also the general 
> > direction of IETF for various protocols.  And the 
> compatibility with ASCII helps too.
> > I think it is a pretty good choice.
> > 
> > Thanks,
> > Anton.
> > 
> > > -----Original Message-----
> > > From: [EMAIL PROTECTED]
> > > [mailto:[EMAIL PROTECTED] On Behalf Of 
> > > Alexander Clemm (alex)
> > > Sent: Friday, May 27, 2005 4:58 PM
> > > To: [EMAIL PROTECTED]; Rainer Gerhards
> > > Cc: syslog-sec@employees.org
> > > Subject: RE: [Syslog-sec] Syslog protocol - UTF-8 encoding
> > >
> > > Andrew, David,
> > >
> > > thank you.  I was a bit too quick sending out the earlier 
> message; I 
> > > was confused.  With ASCII being effectively a subset of 
> UTF-8, issue 
> > > 1 goes away, and as far as issue 2 is concerned, this 
> does allay my 
> > > concerns, at least as far as the sender side is concerned.  I am 
> > > still wondering if for the receiver side it might still 
> be useful to 
> > > know what encoding to expect - full UTF-8, or just the 
> ASCII subset.
> > > It would be interesting to hear the perspective of someone on the 
> > > receiver side, but from my point, my concerns are 
> addressed.  As for 
> > > other encodings being of interest, while I would not rule 
> it out I'm 
> > > not aware of any.
> > >
> > > Kind regards
> > > --- Alex
> > >
> > >
> > >
> > > -----Original Message-----
> > > From: David B Harrington [mailto:[EMAIL PROTECTED]
> > > Sent: Wednesday, May 25, 2005 8:10 PM
> > > To: [EMAIL PROTECTED]; Alexander Clemm (alex); 'Rainer Gerhards'
> > > Cc: syslog-sec@employees.org
> > > Subject: RE: [Syslog-sec] Syslog protocol - UTF-8 encoding
> > >
> > > Hi,
> > >
> > > In reading my response, it seeems a bit too succinct.
> > >
> > > The relevant text from STD63 is:
> > > "UTF-8, the object of this memo, has a one-octet encoding 
> unit.  It
> > >    uses all bits of an octet, but has the quality of 
> preserving the 
> > > full
> > >    US-ASCII [US-ASCII] range: US-ASCII characters are 
> encoded in one
> > >    octet having the normal US-ASCII value, and any octet 
> with such a
> > >    value can only stand for a US-ASCII character, and 
> nothing else."
> > >
> > > Hope this allays your concerns.
> > >
> > > David Harrington
> > > [EMAIL PROTECTED]
> > >
> > > > -----Original Message-----
> > > > From: [EMAIL PROTECTED]
> > > > [mailto:[EMAIL PROTECTED] On Behalf 
> Of David B 
> > > > Harrington
> > > > Sent: Wednesday, May 25, 2005 10:58 PM
> > > > To: 'Alexander Clemm (alex)'; 'Rainer Gerhards'
> > > > Cc: syslog-sec@employees.org
> > > > Subject: RE: [Syslog-sec] Syslog protocol - UTF-8 encoding
> > > >
> > > > Hi,
> > > >
> > > > According to STD63, UTF-8 has the characteristic of 
> preserving the 
> > > > full US-ASCII range.
> > > >
> > > > David Harrington
> > > > [EMAIL PROTECTED]
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: [EMAIL PROTECTED]
> > > > > [mailto:[EMAIL PROTECTED] On Behalf
> > > Of Alexander
> > >
> > > > > Clemm (alex)
> > > > > Sent: Wednesday, May 25, 2005 8:56 PM
> > > > > To: Rainer Gerhards
> > > > > Cc: syslog-sec@employees.org
> > > > > Subject: [Syslog-sec] Syslog protocol - UTF-8 encoding
> > > > >
> > > > >
> > > > > Hi,
> > > > >
> > > > > 2 questions/ suggestions concerning the UTF-8 encoding in the
> > > syslog
> > > > > protocol:
> > > > >
> > > > > 1) Is the " " (white space) after the header to be encoded in
> > > ASCII
> > > > or
> > > > > UTF-8?  The spec seems currently open to that respect
> > > (although it
> > > > > would seem logical for it to be still in ASCII); should be 
> > > > > clarified.
> > > > >
> > > > > 2)   Concerning the UTF-8 encoding, depending on 
> where you send
> > > > syslog
> > > > > messages there are many scenarios in which it would be 
> > > > > beneficial
> > > to
> > > > > have an option in which NOT to use UTF-8 encoding but to
> > > also allow
> > > > > for other encodings, in particular plain ASCII.  Such 
> an option 
> > > > > would
> > > > also
> > > > > allow for quicker adaptation of this specification, as it is 
> > > > > eases
> > > > the
> > > > > migration.  To provide for that, it seems it would 
> make sense to
> > > > allow
> > > > > for a flag in the header part of the message - at the
> > > tail end (that
> > >
> > > > > is known to be still ASCII encoded), right before the 
> structured 
> > > > > data, that indicates which encoding is used - that is,
> > > whether UTF-8
> > >
> > > > > is in effect, or if another encoding is used - ex. ASCII, or 
> > > > > even proprietary.
> > >
> > > > >
> > > > > (Apologies in case this aspect was discussed in the 
> past and I 
> > > > > am beating on a dead horse; but this appears important
> > > enough to bring
> > > > > up.)
> > > > >
> > > > >
> > > > > --- Alex
> > > > > _______________________________________________
> > > > > Syslog-sec mailing list
> > > > > Syslog-sec@www.employees.org
> > > > > http://www.employees.org/mailman/listinfo/syslog-sec
> > > > >
> > > >
> > > >
> > > > _______________________________________________
> > > > Syslog-sec mailing list
> > > > Syslog-sec@www.employees.org
> > > > http://www.employees.org/mailman/listinfo/syslog-sec
> > > >
> > > _______________________________________________
> > > Syslog-sec mailing list
> > > Syslog-sec@www.employees.org
> > > http://www.employees.org/mailman/listinfo/syslog-sec
> > >
> > _______________________________________________
> > Syslog-sec mailing list
> > Syslog-sec@www.employees.org
> > http://www.employees.org/mailman/listinfo/syslog-sec
> 
_______________________________________________
Syslog-sec mailing list
Syslog-sec@www.employees.org
http://www.employees.org/mailman/listinfo/syslog-sec

Reply via email to