Hey Henry,
Thanks for the reply. I actually went back and tried this same
transformation with Xalan 2.6 and got similar results (I hadn't realized
that we were doing our own filtering previously that was removing all of
the dtd content), but there are some differences between output. I am
curious as to why this would have changed.
Here are a few comparisons between 2.6 and 2.7.1
2.6 output:
<?xml version="1.0" encoding="UTF-8"?><!--================== Imported
Names ====================================--> ...
(notice a comment starts immediately after the xml declaration, the
doctype declaration actually comes after all of the dtd stuff right
before the html content)
2.7.1 output:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" [
<!ENTITY %HTMLlat1 PUBLIC "-//W3C//ENTITIES Latin 1 for XHTML//EN" > ...
<!--================== Imported Names
====================================--> ...
(here the doctype comes after the xml declaration and has entity
declarations and comments inside the doctype declaration)
2.6 output:
Contains no entity declarations
2.7.1
Contains
<!ENTITY %HTMLlat1 PUBLIC "-//W3C//ENTITIES Latin 1 for XHTML//EN" >
<!ENTITY %HTMLsymbol PUBLIC "-//W3C//ENTITIES Symbols for XHTML//EN" >
<!ENTITY %HTMLspecial PUBLIC "-//W3C//ENTITIES Special for XHTML//EN" >
Those are the major differences. Again, I'm just curious as to why this
has changed.
Thanks,
Mike
Henry Zongaro wrote:
Hi, Mike.
Mike Strauch <[EMAIL PROTECTED]> wrote on 2008-04-16
03:43:23 PM:
> I've recently upgraded Xalan from 2.6 to 2.7.1 and have run into the
> following issues:
>
> I'm using a TransformerIdentityImpl to transform the html below and the
> result includes a lot of information that I believe is coming from the
> dtd associated with the doctype, and I'm not sure why it is being
> included. This alone is not my only concern. When I attempt to
> validate the result as xml I receive the following error:
>
> "White space is required after "<!ENTITY" in the entity declaration."
My understanding is that the description of the identity Transformer
that's created by calling the zero-argument newTransformer() method is
intentionally vague in order to allow an implementation to preserve
more of the source document than would be possible with an XSLT
identity stylesheet. In particular, Xalan-J attempts to preserve as
much information about the DTD as it can.
There are no specific output settings to suppress that information.
To suppress the DTD you would either have to create an identity
stylesheet and use that for the transformation or filter out the the
DTD somewhere? Are you using a SAXSource?
It looks like a bug that there is a space missing between the % and
the name of the entity in the entity declaration. Could I ask you to
open a bug report in Jira?
Thanks,
Henry
------------------------------------------------------------------
Henry Zongaro
XML Transformation & Query Development
IBM Toronto Lab T/L 313-6044; Phone +1 905 413-6044
mailto:[EMAIL PROTECTED]