To make it extra clear, what is outputted where the is are 2 bytes: ascii codes 194 and 160. The 160 is the spacing character that is expected, but the 194 (the letter A with a ^ on top) is pretty magical and annoying.
Some other (but not all) HTML-characters (e.g. ë) seem to have the same problem, but they output other bad characters. I've tried playing with a <head><meta http-equiv="Content-Type" content="text/html; charset=utf8"></head> in the HTML-data while changing the characterset, but nothing changed. Ron ""Ron Korving"" <[EMAIL PROTECTED]> schreef in bericht news:[EMAIL PROTECTED] > Hi, > > I found a bug in DOM. It surprises me that it's never been seen and/or fixed > before. I can't find anything about in the PHP bugtracker anyway. The reason > why I'm posting here and not writing a bugreport, is because I'm not sure if > this is a problem in the PHP-extension or the DOM-library itself. In the > latter case there's nothing anybody here can do, I guess. > > This is the situation: > > <?php > $doc = DOMDocument::loadHTML('<html><body> </body></html>'); > echo "'".$doc->getElementsByTagName('body')->item(0)->textContent."'\n"; > > $doc = DOMDocument::loadHTML('<html><body>foo bar</body></html>'); > echo "'".$doc->getElementsByTagName('body')->item(0)->textContent."'\n"; > ?> > > Output: > > ' ' > 'foo bar' > > Where the heck do these 'Â's come from when it parses an ? I hope > anyone can shed some light on the next step to be taken in order to fix > this. > > Thanks, > > Ron Korving -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php