To make it extra clear, what is outputted where the   is are 2 bytes:
ascii codes 194 and 160. The 160 is the spacing character that is expected,
but the 194 (the letter A with a ^ on top) is pretty magical and annoying.

Some other (but not all) HTML-characters (e.g. ë) seem to have the same
problem, but they output other bad characters.

I've tried playing with a <head><meta http-equiv="Content-Type"
content="text/html; charset=utf8"></head> in the HTML-data while changing
the characterset, but nothing changed.

Ron

""Ron Korving"" <[EMAIL PROTECTED]> schreef in bericht
news:[EMAIL PROTECTED]
> Hi,
>
> I found a bug in DOM. It surprises me that it's never been seen and/or
fixed
> before. I can't find anything about in the PHP bugtracker anyway. The
reason
> why I'm posting here and not writing a bugreport, is because I'm not sure
if
> this is a problem in the PHP-extension or the DOM-library itself. In the
> latter case there's nothing anybody here can do, I guess.
>
> This is the situation:
>
> <?php
>   $doc = DOMDocument::loadHTML('<html><body>&nbsp;</body></html>');
>   echo "'".$doc->getElementsByTagName('body')->item(0)->textContent."'\n";
>
>   $doc = DOMDocument::loadHTML('<html><body>foo&nbsp;bar</body></html>');
>   echo "'".$doc->getElementsByTagName('body')->item(0)->textContent."'\n";
> ?>
>
> Output:
>
> 'Â '
> 'foo bar'
>
> Where the heck do these 'Â's come from when it parses an &nbsp; ? I hope
> anyone can shed some light on the next step to be taken in order to fix
> this.
>
> Thanks,
>
> Ron Korving

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to