RE: [Syslog] #5 - character encoding (was: Consensus?)

Chris Lonvick Wed, 30 Nov 2005 05:42:41 -0800

Hi Rainer,

I believe that we are saying the same thing.  :)

If there is no indicator of encoding or language then a reciever will notknow what it is receiving - just like receivers don't know what they arereceiving today. They MAY make an assumption that it is something inUS-ASCII (but may be disappointed).

If there is an indicator of the encoding and language then the receiverwill know exactly what it is. Having an indicator should be RECOMMENDEDbut not REQUIRED for ease of migration.


Is that what we're all saying?

Thanks,
Chris



On Wed, 30 Nov 2005, Rainer Gerhards wrote:

Chris,

Let's use this email as an example.  :)  There is no
indication that I'm
using US-ASCII encoding or that I'm writing in English.


I think there actually is. If I am right, the SMTP RFCs require mail text to be 
US-ASCII. Only via MIME and/or escape characters you can include 8-bit data. For example 
Müller and Möller might create some problems in some mailers (But I guess my Mail system 
will encode them with =<hexval>). Dropping messages with octets > 127 in the 
subject is a common spam protection setting...

However, you're
able to recieve this and read it.  Similarly, you could write
an email in
German and send it to me.  I would still be able to recieve
it but I'd
have a difficult time parsing the meaning.

I'm suggesting that same approach for the transmission of the syslog
content.  If I really wanted you to know what encoding and
language I'm
using in an email, I would specify a mime header.  syslog
senders will
continue to pump out whatever encoding and language they've
been using
and recievers will continue to do their best to parse them.
If a vendor
wants to get very specific about that, then they will have to
use an SD-ID
to identify the contents of the message.


Here I agree with you. What I was saying is that IF the header says it is US-ASCII, only 
then we should assume it actually is. If there is no "enc" SD-ID, then we do 
not know what it is but can assume ... whatever we assume. Let me phrase it that way:

If the message contains

[enc="us-ascii" lang="en"]

then the receiver can honestly expect it to be US-ASCII. But if it does not contain any 
"enc" the receiver does not know exactly and assume anything it finds useful 
(may be ASCII, may not).

Does this clarify? I somehow have the impression we mean the same thing and I 
simply do not manage to convey what I intend to ;)

Rainer


Mit Aufrichtigkeit,
Chris




On Wed, 30 Nov 2005, Rainer Gerhards wrote:

Andrew,

Hi Rainer,

Why don't we look at it from the other direction?  We could

state that any

encoding is acceptable - for ease-of-use/migration with

existing syslog

implementations.  It is RECOMMENDED that UTF-8 be used.

When it is

used, an SD-ID element will be REQUIRED.  e.g. -

[enc="utf-8" lang="en"]

I like that idea too.

So, if no SD-ID encoding element is specified, then we must
assume US-ASCII
and deal with it accordingly??


I think not. If it is not present, we known that we do not

know it. If

it is US-ASCII, I would expect something like

[enc="us-ascii" lang="en"]

Of course, we could also say if it is non-present, we can assume
US-ASCII. But then we would need to introduce

[enc="unknown"]

for the (common) case where we simply do not know it (again: think
POSIX). I find this somehwat confusing.

Rainer

_______________________________________________
Syslog mailing list
Syslog@lists.ietf.org
https://www1.ietf.org/mailman/listinfo/syslog

RE: [Syslog] #5 - character encoding (was: Consensus?)

Reply via email to