I just spent a bit of time looking into this issue in terms of Lenya 1.2. See my discussion with myself from 3/30/06 in this forum, subject "Entity Resolution"

The bottom line is that the one form editor can be made to accept entities, but this requires modifying org.apache.lenya.cms.cocoon.acting.OneFormEditorSaveAction. This is fast and will work, but it means that all your docs have to adhere to the same DTD.

Consequently, I suppose you could re-write the pipelines for the form based editors, and have the serializes that provide their content output a DTD and have the forms pass that DTD back as part of the document, so that they can be be saved. The default publication doesn't do this, and I've not experimented with it, but it seems like it'd work. Keep in mind that without getting Xalan to stop transforming entities, your entities will be saved as their numeric equivalent once they leave the form-based editors (assuming that this setup works (which I can't vouch for)). Anyone else played with this?

BXE just won't allow entities: it resolves them when the document opens (and this means you have to get the lenya serializer providing the XML to BXE to output a DTD -- but then of course any entities in that doc will have already been resolved by Xalan on the generate statement) and then BXE doesn't expect to see them anymore. I'd suggest using snippets to provide special characters with the numeric character references.

Here are my notes on my final conclusions concerning Lenya 1.2 and implementing a custom set of xml entities to be provided by Cocoon's entity catalogues:

On Entities in Lenya:

Use the hexadecimal equivalent of html entities, and we have no problems in Lenya as long as the final serializer outputs a browser friendly encoding. The safest seems to be "ISO (blah blah blah)". Anything outside that character set gets sent as • instead of the actual character • , which makes IE happy.

Entities cannot persist in Lenya: any time Xalan parses an XML document, that is generates an XML document (whether it's a .xsl or not), that is reads it from a disk into memory, it resolves entities according to whatever DTD is declared in that documents. I've heard this can be turned off, but haven't played with it.

Any round trip of entities will be overly complicated, and not at all possible with BXE -- BXE itself will resolve any entities passed to it according to the doctype, and so it will also write "Baruch College" if we manage to pass it a DTD and &baruch;

I think the best we can do without completely re-writing lenya and bxe is to use the snippets in BXE for special characters, and write those special characters as their hexidecimal equivalent.

Hand editing (bbedit via webdav) with entities requires declaring an XHTML doctype. So that's no problem -- the generation of that file will resolve the entities no problem.

The forms editors are processed finally by java classes which recieve a document without a DTD or XML declaration, such as:

org.apache.lenya.cms.cocoon.acting.OneFormEditorSaveAction

// Aggregate content
        String encoding = request.getCharacterEncoding();
        String content =
            "<?xml version=\"1.0\" encoding=\""
                + encoding
                + "\"?>\n"
+ addNamespaces(namespaces, request.getParameter ("content"));

They currently add an XML declaration, and it'd be here that they'd need to have DTD impossed on them. Imposing a DTD on them by harding coding it here in the java works perfectly, allowing entities in the oneform editor.

The great problem with this is it doesn't allow you to mix and match DTD. Our custom XML resource types would cause an error when saving because it doesn't match the XHTML doctype imposed by the hard coded java library.

Further, we wouldn't benifit much from this, because the saved document would have "Baruch College" written there, and NOT &baruch; as passing the doc through Xalan as part of the save process transforms all the entities. So we could do a lot of work to allow editors to use entities ONCE, and thereafter have to corrently spell everything. Not a great idea.


On Apr 11, 2006, at 6:20 AM, Andreas Hartmann wrote:

[EMAIL PROTECTED] schrieb:
Michael Ralston schrieb:
What do you mean by 'Predefined'... is it possible to edit this
definition?
Predefined entities are declared in the XML spec:

http://www.w3.org/TR/REC-xml/#sec-predefined-ent

BTW, &nbsp; is not one of them.

So if &nbsp; is not a predefined entity... where is it defined?

In the corresponding DTD (e.g., HTML 4 and XHTML).

I really need to work out where to define &rsquo;
I find it hard to believe nobody has solved this problem before...

I always use numeric character references, like &#160;.

-- Andreas


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to