Hey Henry,

Thanks for the reply. I actually went back and tried this same transformation with Xalan 2.6 and got similar results (I hadn't realized that we were doing our own filtering previously that was removing all of the dtd content), but there are some differences between output. I am curious as to why this would have changed.

Here are a few comparisons between 2.6 and 2.7.1

2.6 output:

<?xml version="1.0" encoding="UTF-8"?><!--================== Imported Names ====================================--> ... (notice a comment starts immediately after the xml declaration, the doctype declaration actually comes after all of the dtd stuff right before the html content)

2.7.1 output:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"; [ <!ENTITY %HTMLlat1 PUBLIC "-//W3C//ENTITIES Latin 1 for XHTML//EN" > ... <!--================== Imported Names ====================================--> ...

(here the doctype comes after the xml declaration and has entity declarations and comments inside the doctype declaration)

2.6 output:

Contains no entity declarations

2.7.1

Contains

<!ENTITY %HTMLlat1 PUBLIC "-//W3C//ENTITIES Latin 1 for XHTML//EN" >
<!ENTITY %HTMLsymbol PUBLIC "-//W3C//ENTITIES Symbols for XHTML//EN" >
<!ENTITY %HTMLspecial PUBLIC "-//W3C//ENTITIES Special for XHTML//EN" >

Those are the major differences. Again, I'm just curious as to why this has changed.

Thanks,

Mike

Henry Zongaro wrote:

Hi, Mike.

Mike Strauch <[EMAIL PROTECTED]> wrote on 2008-04-16 03:43:23 PM:
> I've recently upgraded Xalan from 2.6 to 2.7.1 and have run into the
> following issues:
>
> I'm using a TransformerIdentityImpl to transform the html below and the
> result includes a lot of information that I believe is coming from the
> dtd associated with the doctype, and I'm not sure why it is being
> included.  This alone is not my only concern.  When I attempt to
> validate the result as xml I receive the following error:
>
> "White space is required after "<!ENTITY" in the entity declaration."

My understanding is that the description of the identity Transformer that's created by calling the zero-argument newTransformer() method is intentionally vague in order to allow an implementation to preserve more of the source document than would be possible with an XSLT identity stylesheet. In particular, Xalan-J attempts to preserve as much information about the DTD as it can.

There are no specific output settings to suppress that information. To suppress the DTD you would either have to create an identity stylesheet and use that for the transformation or filter out the the DTD somewhere? Are you using a SAXSource?

It looks like a bug that there is a space missing between the % and the name of the entity in the entity declaration. Could I ask you to open a bug report in Jira?

Thanks,

Henry
------------------------------------------------------------------
Henry Zongaro
XML Transformation & Query Development
IBM Toronto Lab   T/L 313-6044;  Phone +1 905 413-6044
mailto:[EMAIL PROTECTED]

Reply via email to