ID: 35447 Updated by: [EMAIL PROTECTED] Reported By: saramaca at libertysurf dot fr -Status: Assigned +Status: Open Bug Type: XML related Operating System: * PHP Version: 5.1.1 Assigned To: rrichards
Previous Comments: ------------------------------------------------------------------------ [2005-11-28 20:28:32] [EMAIL PROTECTED] As far as the default attribute values - have to check on expat behavior. The other issue is fixed with libxml2 2.6.18. I have a patch (http://www.ctindustries.net/patches/xml.compat.diff.txt) that looks like it should work around the issue with older libxml2 libs, but need more testing with different encoding/BOM schemes to make sure it doesnt break anything as were playing with the libxml encoding handling here. ------------------------------------------------------------------------ [2005-11-28 18:03:18] [EMAIL PROTECTED] expat vs libxml2 incompatibility? ------------------------------------------------------------------------ [2005-11-28 14:55:33] saramaca at libertysurf dot fr Description: ------------ In PHP4 xml_parse_into_struct() can parse an UTF-8-encoded XML file with or without a UTF-8 BOM (\xEF\xBB\xBF). In PHP 5, this is no longer the case and it raises an error saying the string doesn't contain any XML data (Empty document). Additionally PHP 5's xml_parse_into_struct() does *NOT* place default attribute values into the struct (e.g. despite the DTD provided, $content[1]['attributes']['type'] isn't set to "literal" in actual result section below ; please compare it to expected result.) This used to work under PHP 4.1.x and above (but the parser is based on expat AFAIK.) PS: I guess "manually" stripping this magic number -- if embedded -- before calling the function would yield the expected result. However I found an acceptable work-around that seems to work equally well across versions 4 and 5 of PHP : <?php ... $parser = xml_parser_create(''); xml_parser_set_option($parser, XML_OPTION_TARGET_ENCODING, $encoding); ... ?> Rather than: <?php ... $parser = xml_parser_create($encoding); ... ?> Reproduce code: --------------- http://www.diptyque.net/bugs/utf8_bom.php ; running PHP 4 --> outputs expected result http://www.diptyque.net/bugs/utf8_bom.phps ; source code Expected result: ---------------- w/ autodetect --> Array ( [0] => Array ( [tag] => bundle [type] => open [level] => 1 [value] => ) [1] => Array ( [tag] => resource [type] => complete [level] => 2 [attributes] => Array ( [key] => rSeeYou [type] => literal ) [value] => A bient&244;t ) [2] => Array ( [tag] => bundle [value] => [type] => cdata [level] => 1 ) [3] => Array ( [tag] => bundle [type] => close [level] => 1 ) ) w/o autodetect --> Array ( [0] => Array ( [tag] => bundle [type] => open [level] => 1 [value] => ) [1] => Array ( [tag] => resource [type] => complete [level] => 2 [attributes] => Array ( [key] => rSeeYou [type] => literal ) [value] => A bient&244;t ) [2] => Array ( [tag] => bundle [value] => [type] => cdata [level] => 1 ) [3] => Array ( [tag] => bundle [type] => close [level] => 1 ) ) Actual result: -------------- w/ autodetect --> Array ( [0] => Array ( [tag] => bundle [type] => open [level] => 1 [value] => ) [1] => Array ( [tag] => resource [type] => complete [level] => 2 [attributes] => Array ( [key] => rSeeYou ) [value] => A bient&244;t ) [2] => Array ( [tag] => bundle [value] => [type] => cdata [level] => 1 ) [3] => Array ( [tag] => bundle [type] => close [level] => 1 ) ) w/o autodetect --> Empty document ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=35447&edit=1