ID: 24934 Updated by: [EMAIL PROTECTED] Reported By: tony at marston-home dot demon dot co dot uk -Status: Open +Status: Bogus Bug Type: DOM XML related Operating System: Windows XP PHP Version: 4.3.2 New Comment:
UTF-8 is the internal encoding for libxml when storing the document. You need to convert non UTF-8 encoded strings into UTF-8 when setting content. Also, for output to a browser you should be using html_dump_mem. i.e.: $value = $doc->create_text_node(mb_convert_encoding($fieldvalue,"UTF-8", "ISO-8859-1")); and use either: $xml_string = $doc->dump_mem(true, "ISO-8859-1"); or $xml_string = $doc->html_dump_mem(); to output the document correctly Previous Comments: ------------------------------------------------------------------------ [2003-08-11 04:45:41] tony at marston-home dot demon dot co dot uk Here is a test script which demonstrates this bug: <?php //***************************************************************************** // test script for bug #24934 //***************************************************************************** // set up $fieldarray with test values $fieldarray['field1'] = "Côte d'Ivoire"; $fieldarray['field2'] = "Curaçao"; $fieldarray['field3'] = "Lisérgida"; // create a new XML document $doc = domxml_new_doc('1.0'); // create root node $root = $doc->create_element('root'); $root = $doc->append_child($root); // create record node $occ = $doc->create_element('record'); $occ = $root->append_child($occ); // insert each field as a child node foreach ($fieldarray as $fieldname => $fieldvalue) { $child = $doc->create_element($fieldname); $child = $occ->append_child($child); $value = $doc->create_text_node($fieldvalue); $value = $child->append_child($value); } // foreach unset($fieldarray); // get completed xml document $xml_string = $doc->dump_mem(true); unset($xml_doc); // dump to a disk file with '.xml' extension $fname = basename($_SERVER['PHP_SELF']) .'.xml' ; $fp = fopen($fname, 'w'); $result = fwrite($fp, $xml_string); fclose($fp); exit; ?> If you try to load the XML file into your browser you will see that it contains corrupt characters. For example where is should have 'ô' for ô (letter o with circumflex) it has '�' instead. Basically the $doc->create_text_node() method is not translating special characters into the right hex code according to the HTML specification. ------------------------------------------------------------------------ [2003-08-08 17:45:51] [EMAIL PROTECTED] Not enough information was provided for us to be able to handle this bug. Please re-read the instructions at http://bugs.php.net/how-to-report.php If you can provide more information, feel free to add it to this bug and change the status back to "Open". Thank you for your interest in PHP. can you provide a small script as I cant reproduce the corruption you are getting ------------------------------------------------------------------------ [2003-08-04 08:47:14] tony at marston-home dot demon dot co dot uk Description: ------------ I have a field with the value "Cote d'Ivoire" (where the letter 'o' is actually 'o circumflex') which is not being deal with correctly by $doc->create_text_node(). If I pass the text through htmlentities() beforehand what appears in the XML output is "C&ocirc;te d'Ivoire" instead of "Côte d'Ivoire". If I do not use htmlentities() on the value the output is "C�'Ivoire" (which is totally corrupt) instead of "Côte d'Ivoire" (which is what I expect). A similar fault exists with all the other special charcters I have tried, such as 'c cedila' etc. Expected result: ---------------- If my input is "Co(circumflex)te d'Ivoire" I expect the output to be "Côte d'Ivoire" Actual result: -------------- Instead of "Côte d'Ivoire" I am getting "C�'Ivoire" ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=24934&edit=1