Hi Sheran,
On Tue, 29 Nov 2005, Shyyunn Lin (sheranl) wrote:
Chris:
I think having SD-ID with [enc="utf-8" lang="English"] may be a good
approach. If different language use utf-8 encoding, then "lang=" can
distinguish it.
We _should_ be using language codes from RFC 3066. That specifies ISO 639
language tags. 639-1 has 2 character codes ("en" is English) and 639-2
has 3 characters ("eng" is English). RFC 3066 will likely be replaced by
the works of the Language Tag Registry Update (ltru) Working Group.
http://www.ietf.org/html.charters/ltru-charter.html
They have IDs in the works. Until those become RFCs we should continue to
reference RFC 3066.
Also want to clarify that you suggest that if the message is in ASCII,
it will not required SD-ID, but for all other encodings, SD-ID will be
required.
Yes - that's my suggestion.
Note most other encoding methods already imply the language used, for
example, in Chinese, there are several encoding methods, Traditional
Chinese used in Taiwan and Hong Kong is Big5, and simplified Chinese
used in Mainland China is GBK, so if the message is in traditional
Chinese char, it will be shown as [enc="Big5", lang="Traditional
Chinese"], a little bit redundant. The Big5 also includes all English
char so it can be a mix of Chinese and English.
Good point. As far as I can tell, "Big5" is not recognized by any
accredited standards developing organization. It is recognized by the
Ideographic Rapporteur Group (IRG) which reports to the Unicode
consortium. The recognized way to represent Chinese characters,
traditional and simplified, is through ISO 639-2 with the subcodes to
indicate traditional and simplified for the "zh" _language_. The ID on
"Tags for Identifying Languages"
http://www.ietf.org/internet-drafts/draft-ietf-ltru-registry-14.txt
identifies simplified Chinese as "zh-Hans" and traditional Chinese as
"zh-Hant". Additional subtags could identify a locale such as
"zh-Hant-TW" for Taiwan Chinese in traditional script. This is from the
"Initial Language Subtag Registry" ID.
http://www.ietf.org/internet-drafts/draft-ietf-ltru-initial-06.txt
I think that we should specify encoding and language tags as
striaghtforward as possible and let others augment syslog-protocol (in the
future) with other encoding mechanisms. We can RECOMMEND that encoding be
in UTF-8 and language tags come from RFC 3066. We can allow that other
encoding and language identifications are acceptable. In the worst case,
a vendor will have the option of [EMAIL PROTECTED]"something" [EMAIL PROTECTED]"piglatin"].
Does this work for you?
Thanks,
Chris
Regards,
Sheran
-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Chris Lonvick
(clonvick)
Sent: Tuesday, November 29, 2005 10:22 AM
To: Rainer Gerhards
Cc: [EMAIL PROTECTED]
Subject: RE: [Syslog] #5 - character encoding (was: Consensus?)
Hi Rainer,
Why don't we look at it from the other direction? We could state that
any encoding is acceptable - for ease-of-use/migration with existing
syslog implementations. It is RECOMMENDED that UTF-8 be used. When it
is used, an SD-ID element will be REQUIRED. e.g. - [enc="utf-8"
lang="en"]
Thoughts?
All: Let's discuss this and close this issue.
Thanks,
Chris
On Tue, 29 Nov 2005, Rainer Gerhards wrote:
Chris & WG,
#5 Character encoding in MSG: due to my proof-of-concept
implementation, I have raised the (ugly) question if we need
to allow encodings other than UTF-8. Please note that this
question arises from needs introduced by e.g. POSIX. So we
can't easily argue them away by whishful thinking ;)
Not even discussed yet.
I haven't reviewed that yet. However, I'll note that allowing
different encoding can be accomplished in the future as long as we
establish a default encoding and a way to identify it in our current
work.
I have read a little in the mailing archive. Please note that in 2000
it was consensus that the MSG part may contain encodings other then
US-ASCII. Follow this threat:
http://www.syslog.cc/ietf/autoarc/msg00127.html
This discussion lead to RFC 3164 saying "other encodings MAY be used".
While this was observed behaviour, we need still to be aware that the
POSIX (and glibc) API places the restrictions on us that we simply do
not know the character encoding used by the application. As such, no
*nix syslogd can be programmed to be compliant to syslog-protocol if
we demand UTF-8 exclusively.
I propose that we RECOMMEND UTF-8 that MUST start with the Unicode
Byte Order Mask (BOM) if used. If the MSG part does not start with the
BOM, it may be any encoding just as in RFC 3164. I do not see any
alternative to this.
Rainer
_______________________________________________
Syslog mailing list
Syslog@lists.ietf.org
https://www1.ietf.org/mailman/listinfo/syslog
_______________________________________________
Syslog mailing list
Syslog@lists.ietf.org
https://www1.ietf.org/mailman/listinfo/syslog
_______________________________________________
Syslog mailing list
Syslog@lists.ietf.org
https://www1.ietf.org/mailman/listinfo/syslog