Edit report at https://bugs.php.net/bug.php?id=63189&edit=1

 ID:                 63189
 Updated by:         cataphr...@php.net
 Reported by:        vl dot homutov at gmail dot com
 Summary:            External DTDs are not processed
-Status:             Open
+Status:             Not a bug
 Type:               Bug
 Package:            *XML functions
 Operating System:   Linux
 PHP Version:        5.4.7
 Block user comment: N
 Private report:     N

 New Comment:

This is not a bug, the external subset is handled separately in libxml2.

See http://lxr.php.net/xref/THIRD_PARTY/libxml2/parser.c#xmlParseDocTypeDecl , 
this is where the doctype is parsed and the external dtd triggers the call of 
the  callback "internalSubset", which we do not hook internally, and which is 
therefore not hookable in userland too. See 
http://lxr.php.net/xref/PHP_TRUNK/ext/xml/compat.c#php_xml_compat_handlers

The internal subset is processed elsewhere in xmlParseInternalSubset() and 
doesn't depend on a SAX callback.

More generally, see http://xmlsoft.org/entities.html :

> WARNING: handling entities on top of the libxml2 SAX interface is 
> difficult!!! 
If you plan to use non-predefined entities in your documents, then the learning 
curve to handle then using the SAX API may be long. If you plan to use complex 
documents, I strongly suggest you consider using the DOM interface instead and 
let libxml deal with the complexity rather than trying to do it yourself.


Previous Comments:
------------------------------------------------------------------------
[2012-09-30 21:04:37] vl dot homutov at gmail dot com

Additional details:

There is also problem if custom entity is present in the attribute:

<?xml version="1.0"?>
<!DOCTYPE mytag [<!ENTITY custom SYSTEM "file.txt">]>
<mytag 
attr="&custom;"><elem>one</elem><elem>two</elem><elem>&custom;</elem></mytag>

gives: XML parser error:XML_ERR_ENTITY_IS_EXTERNAL

------------------------------------------------------------------------
[2012-09-29 19:56:04] vl dot homutov at gmail dot com

Description:
------------
PHP's xml_parse() ignores external DTD specified in the
XML file and thus can't parse the file if it has
unknown entities (defined in the DTD mentioned).


Test script:
---------------
#!/usr/bin/php
<?php

$xml_ext_dtd=<<<EOXML
<?xml version="1.0"?>
<!DOCTYPE mytag SYSTEM "./mytag.dtd">
<mytag><elem>one</elem><elem>two</elem><elem>&custom;</elem>/mytag>
EOXML;

$xml_int_dtd=<<<EOXML
<?xml version="1.0"?>
<!DOCTYPE mytag
[
<!ENTITY custom SYSTEM "file.txt">
]>
<mytag><elem>one</elem><elem>two</elem><elem>&custom;</elem>/mytag>
EOXML;

function externalEntityHandler($parser, $name, $base, $systemId, $publicId)
{
        echo "PROCESS EXTERNAL REFERENCE(file=$systemId)\n";
        return true;
}

function characterDataHandler($parser, $data)
{
        echo "CDATA found: '$data'\n";
}

function xerr($parser)
{
        $out = "XML parser error:";
        $out.=xml_error_string(xml_get_error_code($parser));
        $out.="\n";
        return $out;
}

echo "This works OK - parse xml1:\n$xml_int_dtd\n";
echo "---------------------------------------\n";
$xml_parser = xml_parser_create();
xml_set_character_data_handler($xml_parser, "characterDataHandler");
xml_set_external_entity_ref_handler($xml_parser, "externalEntityHandler");
xml_parse($xml_parser, $xml_int_dtd) or die(xerr($xml_parser));

echo "\nThis FAILS - parse xml2:\n$xml_ext_dtd\n";
echo "---------------------------------------\n";
$xml_parser = xml_parser_create();
xml_set_character_data_handler($xml_parser, "characterDataHandler");
xml_set_external_entity_ref_handler($xml_parser, "externalEntityHandler");
$rv = xml_parse($xml_parser, $xml_ext_dtd);
if (!$rv) echo xerr($xml_parser);

echo "file 'mytag.dtd' is:\n".file_get_contents("./mytag.dtd");

?>

Expected result:
----------------
This works OK - parse xml1:
<?xml version="1.0"?>
<!DOCTYPE mytag
[
<!ENTITY custom SYSTEM "file.txt">
]>
<mytag><elem>one</elem><elem>two</elem><elem>&custom;</elem>/mytag>
---------------------------------------
CDATA found: 'one'
CDATA found: 'two'
PROCESS EXTERNAL REFERENCE(file=file.txt)


Actual result:
--------------
This FAILS - parse xml2:
<?xml version="1.0"?>
<!DOCTYPE mytag SYSTEM "./mytag.dtd">
<mytag><elem>one</elem><elem>two</elem><elem>&custom;</elem>/mytag>
---------------------------------------
CDATA found: 'one'
CDATA found: 'two'
XML parser error:Undeclared entity warning
file 'mytag.dtd' is:
<!ENTITY custom SYSTEM "file.txt">



------------------------------------------------------------------------



-- 
Edit this bug report at https://bugs.php.net/bug.php?id=63189&edit=1

Reply via email to