You have to configure the serializer encoding like in this sample:

<map:serializer name="xml-iso" logger="serializ.xml" 
src="org.apache.cocoon.serialization.XMLSerializer"
    mime-type="text/xml">
    <encoding>ISO-8859-1</encoding>
</map:serializer>

-----Mensaje original-----
De: news [mailto:[EMAIL PROTECTED] nombre de Harald Wehr
Enviado el: jueves, 22 de abril de 2004 8:35
Para: [EMAIL PROTECTED]
Asunto: URLEncoding of special characters



I have a problem concerning special german characters occuring in urls.
I made a minimal example to show my problems. Assume following pipeline
snippet:

<map:match pattern="SpecialCharacters.html">
   <map:generate type="file" src="context://content/test1.xml"/>
   <map:serialize type="html"/>
</map:match>

The test1.xml looks like this. Please consider the special german
characters in the url (hope the are displayed correctly in your mail
client):

<?xml version="1.0" encoding="iso-8859-1" ?>
<html>
    <head>
      <title>Test</title>
    </head>
    <body>
       <a href="ÜTest.html">ÜTest</a>
       <a href="ÄTest.html">ÄTest</a>
    </body>
</html>

The HTML-Serializer encodes the urls to following output (source code of
HTML file):

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01
Transitional//EN""http://www.w3.org/TR/html4/loose.dtd";>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<title>Test</title>
</head>
<body>
<a href="%C3%9CTest.html">&Uuml;Test</a>
<a href="%C3%84Test.html">&Auml;Test</a>
</body>
</html>

So the Ü is encoded to %C3%9C and Ä to %C3%84 but I need %DC for Ü and
%C4 for Ä.

The java.net.URLEncoder.encode method brings the following:

System.out.print(java.net.URLEncoder.encode("ÜÄ","UTF-8"));
Result: %C3%9C%C3%84

System.out.print(java.net.URLEncoder.encode("ÜÄ","ISO-8859-1"));
Result: %DC%C4

So why does the serializer does this UTF-8 url encoding? In the web.xml
I set the container-encoding and form-encoding parameters to ISO-8859-1
without any changes. Serializer is the defined the following way in the
sitemap:

<map:serializer logger="sitemap.serializer.html" mime-type="text/html"

     name="html" pool-grow="4" pool-max="32" pool-min="4"
     src="org.apache.cocoon.serialization.HTMLSerializer">
  <doctype-public>-//W3C//DTD HTML 4.01 Transitional//EN</doctype-public>
  <doctype-system>http://www.w3.org/TR/html4/loose.dtd</doctype-system>
  <encoding>ISO-8859-1</encoding>
</map:serializer>

Can you give me any hints how I get the url correctly encoded? (need it
for further database lookups).

Cocoon: Dev-Snapshot from 2004-03-29
Java: 1.4.2_03

Thanks for your help

Harald

--
Institut für Tourismus- und Geo-Informationssysteme GmbH
Sitz: Friedrichstrasse 57-59 38855 Wernigerode

Büro: Gießerweg 5
       38855 Wernigerode            Web:     http://www.itgis.com
                                    Tel:     03943/557807
                                    Fax:     03943/557808

Das Internet-Lexikon - Ein Dienst der ITGIS GmbH:
http://www.knowlex.org

Privat: http://www.harald-wehr.de



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


*************************************************************
Este correo ha sido procesado por el Antivirus del Grupo FCC.
*************************************************************

*************************************************************
Este correo ha sido procesado por el Antivirus del Grupo FCC.
*************************************************************

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to