If they wrote the Umlaute as ä then it works fine. Also in
XML-CDATAs...
But you can setup a filter with a replacement for the Umlaute. Works very
fast :-))
regards
Daniel
----- Original Message -----
From: "Bill Conlon" <[EMAIL PROTECTED]>
To: <witango-talk@witango.com>
Sent: Thursday, September 08, 2005 7:25 PM
Subject: Re: Witango-Talk: Problems with umlauts, etc
Oh this is a messy problem. Let me guess that you are using <@URL> to
(or CMD wget or some such) to retrieve a page via HTTP GET.
The character encoding is supposed to be specified by the server, but
sometimes it's in a <meta>. And sometimes it's specified but wrong, and
just appears correctly to the author of the page, but not necessarily
when rendered/parsed by a remote client. So that's the first problem,
wWhat's the encoding -- Latin-1, UTF-8, etc, and how do you map this to a
character set. Sometimes there is no equivalent character.
And then there's the whole problem of trying to parse this. Some parsers
assume every byte is a character, but we now have multi-byte character
sets also. I'm thinking of some perl scripts here, but I actually don't
know how Witango's string handling will deal with multi-byte characters.
But assuming you can identify a byte sequence containing the desired
data, maybe you can always convert it to something like UTF-8 so it can
be stored as XML CDATA? At least you would have a consistent
representation.
bill
On Thursday, September 8, 2005, at 06:03 AM, Dale Graham wrote:
We're collecting data from a remote website. Author names from this
website occasionally come in with umlauts, diacriticals and the like.
We'd like if at all possible to preserve this data or at worst, make a
reasonable conversion (e.g. umlauted u to u), but I'm having trouble
figuring out how to do this, since the character set I am receiving from
the remote server doesn't match the character set on my Witango server.
(Mac OS X)
That is, an umlaut on my setup would be ü but is coming through
from the remote server as ü
And would that data be different if the person receiving it was on a
Windows or *nix browser instead of a Mac browser? (To add to the level
of complexity!)
How do the experts out there handle this?
I tried to search the archives, but seemed to be lacking the magic
keywords to find anything I could use.
_______________________________________________________________________ _
TO UNSUBSCRIBE: Go to http://www.witango.com/developer/maillist.taf
________________________________________________________________________
TO UNSUBSCRIBE: Go to http://www.witango.com/developer/maillist.taf
________________________________________________________________________
TO UNSUBSCRIBE: Go to http://www.witango.com/developer/maillist.taf