On Tuesday 2004.01.13 09:48:56 -0800, Addison Phillips [wM] wrote:
> German characters not correct in output webformHi Bert,
> 
> This is a common problem. 
> 
> When you do a form submit (POST or GET of data to the server), the browser encodes 
> the characters being sent using the character encoding that the page uses. In your 
> case, from the examples you sent, this encoding is Unicode UTF-8. UTF-8 is a 
> multibyte encoding of Unicode in which non-ASCII characters take two or more bytes. 
> In this case, the German accented characters each take two bytes.
> 
> When the server receives the data, it decodes the original bytes sent by the 
> browser. The problem is: what encoding should be used to interpret the bytes? For 
> historical reasons, most Web servers (include J2EE, .NET, Apache/Tomcat, etc.) 
> default to using ISO-8859-1 (Latin-1), a single byte Western European encoding. This 
> is what is happening in your case: each UTF-8 byte is being treated as a single 
> character, leading to the corruption you are experiencing. You can see that each 
> German character is interpreted as a sequence of two bytes.
> 
> To fix your problem you must change your server side configuration to interprets the 
> bytes sent using the same encoding that the form uses (UTF-8 in this case). This has 
> nothing to do with your Javascript. What exactly to do depends on the technology of 
> your web server. There are too many of these to list here, but you should be able to 
> do a little searching to find the answer (or write back off list and I can probably 
> point you to the documentation).
> 

I and I'm sure many others have seen this problem too.  

Given the maturity of support for Unicode in the various relevant technologies(web 
servers, web browsers, XML, Javascript, Java, etc...) and the global nature of the 
marketplace, it seems to me that it is high time 
that web servers default to serving UTF-8 instead of ISO-8859-1.  The W3C should 
really stipulate UTF-8 as the default.

In the case of Apache, it is trivial to change the configuration file to UTF-8 instead 
of ISO-8859-1 (I even remember that 
it's around line 780 something
in the default configuration file distributed with Apache version 2.x), but I wish it 
was the 
DEFAULT.  In the case of IIS (the server used for serving the form which was 
highlighted as having the problem at the beginning
of this thread), I would assume that it would also not be difficult to set the 
configuration file, but I don't have first-hand
knowledge about how to do that.  In any case, UTF-8 should be the default for IIS and 
all the other servers out there.
 
When I look at the rather long trail of emails in this thread -- all of this stuff 
about legacy character sets and what
browsers and servers are going to do -- or not do as the case may very well be -- with 
characters that are not defined in 
legacy character sets -- I think to myself, "Well, come on! The answer for solving 
almost all of your problems is so obvious:
use Unicode and in particular, that very useful transformation format called UTF-8!"   
For European languages using Latin script, 
UTF-8 means
that, on average, web pages are going to be a few bytes longer than they were before 
under ISO-8859-x -- but it's nothing
significant!  And if the web pages, databases, and relevant glue between the two ( 
Java, Javascript, ASP, PHP, Perl, Python, etc...) 
are all designed around UTF-8 today, it means that you'll have a lot less work to do 
tomorrow when (1) you or client decides
to go after some other global audience or market or (2) XML rules the world, or (3) 
both. 

And then threads like this one will be just a thing of the past...

- Ed Trager
  Kellogg Eye Center
  University of Michigan

> You might want to be aware of the W3C's Internationalization mailing list (See 
> http://lists.w3.org/Archives/Public/www-international/) and of the FAQs at 
> http://www.w3.org/International/geo (alas, the FAQ on this topic hasn't been 
> published yet!)
> 
> Best Regards,
> 
> Addison
> Addison P. Phillips
> Director, Globalization Architecture
> webMethods | Delivering Global Business Visibility
> http://www.webMethods.com
> Chair, W3C Internationalization (I18N) Working Group
> Chair, W3C-I18N-WG, Web Services Task Force
> http://www.w3.org/International
> 
> Internationalization is an architecture.
> It is not a feature. 
> 
>   -----Original Message-----
>   From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Bert Kemner
>   Sent: lundi 12 janvier 2004 22:38
>   To: '[EMAIL PROTECTED]'
>   Subject: German characters not correct in output webform
> 
> 
> 
> 
>   Hi, 
>     
>    I've a problem with a Javascript form on a german website. 
>    (http://informationservices.swets.de/web/show/id=47553) 
>    The input of the form contains german characters. 
>    But the output (which is generated by submitting the form) does not 
>    display those characters (see example beneath). My first reaction to 
>    this problem is that Unicode somehow does not translate these german 
>    characters to Windows (Outlook). 
>     
>    Example form output: 
>    Form: Kontaktformular 
>    Sender: Receiver: [EMAIL PROTECTED] 
>    Insertdate: 2/12/2003 
>     
>    Vor- und Zuname:: Birgitta MÃÆÃÂhe 
>    Firma / Institution:: ÃÆ?ffentliche BÃÆÃÂcherei Mainz 
>    Berufsbezeichnung:: 
>    E-Mail-Adresse:: [EMAIL PROTECTED] 
>    Telefonnummer:: 
>    Ihre Fragen und Anregungen:: Wir interessieren uns fÃÆÃÂr eine 
>    Abonnement der Print-Ausgabe der britischen Tageszeitung "Times". Ist 
>    dies ÃÆÃÂber Sie mÃÆÃÂglich und wenn ja zu welchen Konditionen. 
>    (Preis, wann wird zugestellt? ...) 
>     
>    Can you help me with this, or suggest something? 
>     
>    I really appreciate your help. 
>     
>    Kindest regards, 
>     
>    Bert Kemner, 
>    webmaster, 
>    Swets Information Services, 
>    Lisse, 
>    The Netherlands 
> 
> 
> 
>   Bert Kemner                                     
>   Webmaster 
> 
>   Swets Information Services                              
>   P.O.Box 830 
>   2160 SZ Lisse                                                           
>   Heereweg 347B                           
>   2161 CA Lisse                                   
>   The Netherlands                         
>   T +31 (0)252 435 241                    
>   F +31 (0)252 415 888                            
>   E [EMAIL PROTECTED]  
>   www.swets.com           
> 
> 

Reply via email to