ID: 40433 Updated by: [EMAIL PROTECTED] Reported By: consensus at gmail dot com -Status: Open +Status: Bogus Bug Type: SimpleXML related Operating System: any PHP Version: 5.2.1 New Comment:
Yes, it's defined behaviour and will stay that way. About the "waste of cpu cycles". The library used for SimpleXML (and all other XML extensions) is libxml2 and this library does convert anything internally to UTF-8, regardless what the input is. So, you have to convert it back to whatever charset you want in your PHP script Previous Comments: ------------------------------------------------------------------------ [2007-02-10 22:07:20] consensus at gmail dot com Description: ------------ SimpleXML has a wrong behaviour (yes it is defined, but still wrong) regarding the charset. SimpleXML is able to read any xml file as long as the charset is given in the xml header (encoding="...") But when it gives back values it does not recognize the encoding anymore and forces utf-8. While this might be defined behaviour it is still a very unclean/ignorant feature. It is the only function i know which behaves like this. Charsets do have their good reason. and ofcourse you can convert each value you get into the correct charset. But here why this is generally a bad idea: a) Everyone expects the function to not change the charset. b) This is a big waste of cputime. If you have millions of values you have millions of function calls to reconvert them all! Depending on the application you write this can get a real problem. I suggest to rethink about this design decision as it will be a problem for many others too over the time. ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=40433&edit=1