Hi Rainer, I don't know UTF-8 very well. I am of the impression that 0x00 can occur in UTF-8, in multi-byte character sequences. You've been researching the UTF-8. Can you determine if that's true? If it is, then we cannot limit octets to 1..255 values.
dbh > -----Original Message----- > From: Rainer Gerhards [mailto:[EMAIL PROTECTED] > Sent: Friday, February 06, 2004 11:51 AM > To: Anton Okmianski; Harrington, David; [EMAIL PROTECTED] > Subject: RE: -international: trailer > > Anton: > > > I agree with your conclusion that we need to support all > > Unicode/UTF. I > > also think that doing any kind of escaping is generally bad > and should > > be deferred until it is absolutely necessary (like maybe > escaping line > > breaks for storage). > > I agree, escaping needs to be done when it goes to the > storage subsystem > - eventually. We always think "text file" though some use databases, > where this is no issue at all. But that's not the point. > > I thought a while over this issue during the course of the > day... We can > have the storage subsystem escape non-printable characters. Obviously, > it is up to the storage subsystem how it does this. When the data is > then read back, the storage subsystem should decode the persisted > message and provide the original block to e.g. the message > verifier. So > we do not have an issue with -sign. > > Obviously, a syslog-storage RFC comes into the mind, but I > think we are > busy enough with current discussions ;) Let's make one step after > another... > > So I am more or less prepared to edit protcol-03 so that all > characters > are allowed, including ascii control characters. > > The thing left that makes me really frightend is the 0x00 character. I > know allowing it will break a lot of existing code and make it hard to > update it to the new format. On the other hand, explicitely > allowing it > will remove a potential security weakness... some of the bad guys may > have fun with sending 0x00 especially when we do not allow it. > > I am still tempted to allow only octets in the range of 1..255. ;) > > I'd appreciate comments on this issue. If we can solve this, we can > solve this issue here as well as the trailer. And I think we are close > to doing that. > > > > > I guess this means, we can't have a line separator trailer unless we > > escape all others inside of message. I really would prefer > > no escaping. > > I agree to this - UTF-8 kills the TRAILER. > > On second thought, the trailer was a bad idea initially. After all, my > intension was to have an extra sanity check for the framing - but that > is a transport issue, not a general format issue (for a > transport-ignorant message format). > > > I think alternatively, a UDP transport can define an > optional/required > > structured element for message length in octets, but it is tricky. > > I am in favour to not do this. As you say, it is tricky - and > the extra > sanity check does not buy us much in UDP (so many things that can go > wrong anyhow...) > > Rainer >