Edit report at https://bugs.php.net/bug.php?id=63189&edit=1
ID: 63189 Updated by: cataphr...@php.net Reported by: vl dot homutov at gmail dot com Summary: External DTDs are not processed -Status: Open +Status: Not a bug Type: Bug Package: *XML functions Operating System: Linux PHP Version: 5.4.7 Block user comment: N Private report: N New Comment: This is not a bug, the external subset is handled separately in libxml2. See http://lxr.php.net/xref/THIRD_PARTY/libxml2/parser.c#xmlParseDocTypeDecl , this is where the doctype is parsed and the external dtd triggers the call of the callback "internalSubset", which we do not hook internally, and which is therefore not hookable in userland too. See http://lxr.php.net/xref/PHP_TRUNK/ext/xml/compat.c#php_xml_compat_handlers The internal subset is processed elsewhere in xmlParseInternalSubset() and doesn't depend on a SAX callback. More generally, see http://xmlsoft.org/entities.html : > WARNING: handling entities on top of the libxml2 SAX interface is > difficult!!! If you plan to use non-predefined entities in your documents, then the learning curve to handle then using the SAX API may be long. If you plan to use complex documents, I strongly suggest you consider using the DOM interface instead and let libxml deal with the complexity rather than trying to do it yourself. Previous Comments: ------------------------------------------------------------------------ [2012-09-30 21:04:37] vl dot homutov at gmail dot com Additional details: There is also problem if custom entity is present in the attribute: <?xml version="1.0"?> <!DOCTYPE mytag [<!ENTITY custom SYSTEM "file.txt">]> <mytag attr="&custom;"><elem>one</elem><elem>two</elem><elem>&custom;</elem></mytag> gives: XML parser error:XML_ERR_ENTITY_IS_EXTERNAL ------------------------------------------------------------------------ [2012-09-29 19:56:04] vl dot homutov at gmail dot com Description: ------------ PHP's xml_parse() ignores external DTD specified in the XML file and thus can't parse the file if it has unknown entities (defined in the DTD mentioned). Test script: --------------- #!/usr/bin/php <?php $xml_ext_dtd=<<<EOXML <?xml version="1.0"?> <!DOCTYPE mytag SYSTEM "./mytag.dtd"> <mytag><elem>one</elem><elem>two</elem><elem>&custom;</elem>/mytag> EOXML; $xml_int_dtd=<<<EOXML <?xml version="1.0"?> <!DOCTYPE mytag [ <!ENTITY custom SYSTEM "file.txt"> ]> <mytag><elem>one</elem><elem>two</elem><elem>&custom;</elem>/mytag> EOXML; function externalEntityHandler($parser, $name, $base, $systemId, $publicId) { echo "PROCESS EXTERNAL REFERENCE(file=$systemId)\n"; return true; } function characterDataHandler($parser, $data) { echo "CDATA found: '$data'\n"; } function xerr($parser) { $out = "XML parser error:"; $out.=xml_error_string(xml_get_error_code($parser)); $out.="\n"; return $out; } echo "This works OK - parse xml1:\n$xml_int_dtd\n"; echo "---------------------------------------\n"; $xml_parser = xml_parser_create(); xml_set_character_data_handler($xml_parser, "characterDataHandler"); xml_set_external_entity_ref_handler($xml_parser, "externalEntityHandler"); xml_parse($xml_parser, $xml_int_dtd) or die(xerr($xml_parser)); echo "\nThis FAILS - parse xml2:\n$xml_ext_dtd\n"; echo "---------------------------------------\n"; $xml_parser = xml_parser_create(); xml_set_character_data_handler($xml_parser, "characterDataHandler"); xml_set_external_entity_ref_handler($xml_parser, "externalEntityHandler"); $rv = xml_parse($xml_parser, $xml_ext_dtd); if (!$rv) echo xerr($xml_parser); echo "file 'mytag.dtd' is:\n".file_get_contents("./mytag.dtd"); ?> Expected result: ---------------- This works OK - parse xml1: <?xml version="1.0"?> <!DOCTYPE mytag [ <!ENTITY custom SYSTEM "file.txt"> ]> <mytag><elem>one</elem><elem>two</elem><elem>&custom;</elem>/mytag> --------------------------------------- CDATA found: 'one' CDATA found: 'two' PROCESS EXTERNAL REFERENCE(file=file.txt) Actual result: -------------- This FAILS - parse xml2: <?xml version="1.0"?> <!DOCTYPE mytag SYSTEM "./mytag.dtd"> <mytag><elem>one</elem><elem>two</elem><elem>&custom;</elem>/mytag> --------------------------------------- CDATA found: 'one' CDATA found: 'two' XML parser error:Undeclared entity warning file 'mytag.dtd' is: <!ENTITY custom SYSTEM "file.txt"> ------------------------------------------------------------------------ -- Edit this bug report at https://bugs.php.net/bug.php?id=63189&edit=1