PLEASE DO NOT REPLY TO THIS MESSAGE. TO FURTHER COMMENT
ON THE STATUS OF THIS BUG PLEASE FOLLOW THE LINK BELOW
AND USE THE ON-LINE APPLICATION. REPLYING TO THIS MESSAGE
DOES NOT UPDATE THE DATABASE, AND SO YOUR COMMENT WILL
BE LOST SOMEWHERE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=2793

*** shadow/2793 Wed Jul 25 12:50:09 2001
--- shadow/2793.tmp.6221        Wed Jul 25 12:50:09 2001
***************
*** 0 ****
--- 1,87 ----
+ +============================================================================+
+ | xml:lang should support RFC3066 and ISO639-2.                              |
+ +----------------------------------------------------------------------------+
+ |        Bug #: 2793                        Product: Xerces-J                |
+ |       Status: NEW                         Version: 1.4                     |
+ |   Resolution:                            Platform: Other                   |
+ |     Severity: Normal                   OS/Version: Other                   |
+ |     Priority: Other                     Component: Core                    |
+ +----------------------------------------------------------------------------+
+ |  Assigned To: [EMAIL PROTECTED]                                  |
+ |  Reported By: [EMAIL PROTECTED]                                               |
+ |      CC list: Cc:                                                          |
+ +----------------------------------------------------------------------------+
+ |          URL:                                                              |
+ +============================================================================+
+ |                              DESCRIPTION                                   |
+ The xml spec (Version 2, p 2.12) states that xml:lang should conform 
+ to RFC1766 or its successor. RFC3066 is the successor to RFC1766.
+ 
+ RFC3066 allows the new language codes of three characters not two,
+ defined by ISO639-2.
+ 
+ RFC3066 also allows digits in second and subsequent tags.
+ 
+ Hence each of the following XML elements is legal but rejected by Xerces-J.
+ 
+ <x xml:lang="ale"/>
+ <x xml:lang="x-33"/>
+ <x xml:lang="en-US-f5"/>
+ 
+ 
+ Moreover Xerces-J accepts the two following syntactically
+ illegal languages:
+ <x xml:lang="en-s"/>
+ <x xml:lang="en-abcdefghij"/>    
+ 
+ Both are illegal because after an ISO-639 code, the second subtag may consist 
+ of:
+ + a two letter country code from ISO3166
+ or
+ + between 3 and 8 characters or digits.
+ 
+ Of these defects the most important is the three character language codes.
+ 
+ ===
+ 
+ 
+ I take the defect to be in:
+ 
+ org.apache.xerces.framework.XMLDocumentScanner.checkXMLLangAttributeValue(int)
+ 
+ that file is unchanged since version 1.4.0 which I have used.
+ 
+ ===
+ 
+ 
+ The syntactic constraints are:
+ case-insensitive
+ 
+ First tag:
+ either "I" or "X" or [A-Z][A-Z] or [A-Z][A-Z][A-Z]
+ 
+ Second tag:
+ when first tag is "I" or "X" then second tag is
+     [0-9A-Z]{1,8}
+ 
+ when first tag is [A-Z][A-Z] or [A-Z][A-Z][A-Z] then second tag is
+     [A-Z][A-Z] or [0-9A-Z]{3,8}
+ 
+ subsequent tags
+     [0-9A-Z]{1,8}
+ 
+ other rules depend on having lookup tables of IANA, ISO639 and ISO3166.
+ 
+  
+ ===
+ 
+ I have Java code that checks values against the tables, which I can
+ forward if you want.
+ The URLs for the tables are:
+ 
+ http://lcweb.loc.gov/standards/iso639-2/englangn.html
+ http://www.din.de/gremien/nas/nabd/iso3166ma/codlstp1/db_en.html
+ http://www.iana.org/assignments/language-tags
+ 
+ 
+ Thanks.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to