ID:               24934
 Updated by:       [EMAIL PROTECTED]
 Reported By:      tony at marston-home dot demon dot co dot uk
-Status:           Open
+Status:           Bogus
 Bug Type:         DOM XML related
 Operating System: Windows XP
 PHP Version:      4.3.2
 New Comment:

UTF-8 is the internal encoding for libxml when storing the document.
You need to convert non UTF-8 encoded strings into UTF-8 when setting
content. Also, for output to a browser you should be using
html_dump_mem.

i.e.:
$value =
$doc->create_text_node(mb_convert_encoding($fieldvalue,"UTF-8",
"ISO-8859-1"));
and use either:
$xml_string = $doc->dump_mem(true, "ISO-8859-1");
or
$xml_string = $doc->html_dump_mem();
to output the document correctly


Previous Comments:
------------------------------------------------------------------------

[2003-08-11 04:45:41] tony at marston-home dot demon dot co dot uk

Here is a test script which demonstrates this bug:

<?php
//*****************************************************************************
// test script for bug #24934
//*****************************************************************************

// set up $fieldarray with test values
$fieldarray['field1'] = "Côte d'Ivoire";
$fieldarray['field2'] = "Curaçao";
$fieldarray['field3'] = "Lisérgida";

// create a new XML document
$doc = domxml_new_doc('1.0');
        
// create root node
$root = $doc->create_element('root');
$root = $doc->append_child($root);
        
// create record node
$occ = $doc->create_element('record');
$occ = $root->append_child($occ);
        
// insert each field as a child node
foreach ($fieldarray as $fieldname => $fieldvalue) {
        $child = $doc->create_element($fieldname);
        $child = $occ->append_child($child);
        $value = $doc->create_text_node($fieldvalue);
        $value = $child->append_child($value);
} // foreach
        
unset($fieldarray);

// get completed xml document
$xml_string = $doc->dump_mem(true);
unset($xml_doc);

// dump to a disk file with '.xml' extension
$fname = basename($_SERVER['PHP_SELF']) .'.xml' ;
$fp = fopen($fname, 'w');
$result = fwrite($fp, $xml_string);
fclose($fp);

exit;

?>

If you try to load the XML file into your browser you will see that it
contains corrupt characters. For example where is should have '&#xF4;'
for ô (letter o with circumflex) it has '&#x134960' instead. Basically
the $doc->create_text_node() method is not translating special
characters into the right hex code according to the HTML specification.

------------------------------------------------------------------------

[2003-08-08 17:45:51] [EMAIL PROTECTED]

Not enough information was provided for us to be able
to handle this bug. Please re-read the instructions at
http://bugs.php.net/how-to-report.php

If you can provide more information, feel free to add it
to this bug and change the status back to "Open".

Thank you for your interest in PHP.


can you provide a small script as I cant reproduce the corruption you
are getting

------------------------------------------------------------------------

[2003-08-04 08:47:14] tony at marston-home dot demon dot co dot uk

Description:
------------
I have a field with the value "Cote d'Ivoire" (where the letter 'o' is
actually 'o circumflex') which is not being deal with correctly by
$doc->create_text_node().

If I pass the text through htmlentities() beforehand what appears in
the XML output is "C&amp;ocirc;te d'Ivoire" instead of "C&ocirc;te
d'Ivoire".

If I do not use htmlentities() on the value the output is
"C&#x134960d'Ivoire" (which is totally corrupt) instead of "C&#xF4;te
d'Ivoire" (which is what I expect).

A similar fault exists with all the other special charcters I have
tried, such as 'c cedila' etc.

Expected result:
----------------
If my input is "Co(circumflex)te d'Ivoire" I expect the output to be
"C&#xF4;te d'Ivoire"

Actual result:
--------------
Instead of "C&#xF4;te d'Ivoire" I am getting "C&#x134960d'Ivoire"


------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=24934&edit=1

Reply via email to