Edwin Kapauni schrieb:

 > Hi Christian,
Have a look at the HTTP response headers[1] of those feeds. The "netzpolitik" feed's header clearly states it's iso-8859-1.

That's right. My mistake. I merely "deducted" the encoding from some characters used inside the text of the feeds as for example &8221; which are clearly non-Latin-1 characters. Since both feeds have ISO-8859-1 in their response headers it means that these feeds are either malformatted or malencoded.


Recoding will be automagically done on generating xml from that source. When you try with the following snippet in your pipeline, your output will have a parsing error[2] but its source code will be strictly according to encoding settings of your serializer

  <map:match pattern="netzpolitik">
    <map:generate src="http://www.netzpolitik.org/feed"/>
    <map:serialize/>
  </map:match>

That depends on the serializer used. I configured the xml-serializer in the sitemap for those feeds to be encoding to UTF-8: no parsing error.

In case of <http://www.netzpolitik.org/feed/> you go in with iso-8859-1 and come out with utf-8 (if you didn't change the settings of your xml-serializer).

You will also have to make sure that character encoding of your output

     <encoding>UTF-8</encoding>

is in accordance with encoding information sent with e.g.

     mime-type="application/xhtml+xml; charset=utf-8"

Do you mean I should register the serializer used with both the parameter "charset" and the element <encoding> corresponding (having the same value)?



by your serializer in HTTP response header. The following is an example xhtml serializer config having both these informations.

  <map:serializer name="xhtml"
        mime-type="application/xhtml+xml; charset=utf-8"
        logger="sitemap.serializer.xhtml"
        pool-grow="2" pool-max="64" pool-min="2"
        src="org.apache.cocoon.components.serializers.XHTMLSerializer">
    <encoding>UTF-8</encoding>
    <indent>no</indent>
  </map:serializer>

What generator have you been using for your works. Maybe I didn't fully understand your problem ...

I used both xml and html (to see, if there is any difference in the output, but there is none). In the Userdocs it says that you shouldnt't use the charset-parameter but rather have the <encoding> set properly. at least this applies to the xml and html-serializers. I used the same setting that is in use for the newsfeeds in the sample-portal shipped with cocoon.


[1]<http://livehttpheaders.mozdev.org/> for Firefox/Mozilla users

Thanks, that was a useful hint. It reminded me of the WebDeveloper-Extension I have installed ;)


Best regards, christian

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]