Hi, On Thu, 2005-05-19 at 20:19 +0200, Martijn Faassen wrote: > Kasimier Buchcik wrote: > > Hi, > > > > On Thu, 2005-05-19 at 17:16 +0200, Martijn Faassen wrote: > > > >>Kasimier Buchcik wrote:
[...] > >>Anyway, anything I can do now to help? I will of course be testing this > >>facility at some stage within lxml, and give feedback then if necessary. > > > > > > You could describe how you intend to manage namespaces in your > > wrapper. Will you try to go W3C way or Libxml2 namespace way? > > I'm following the ElementTree way, which uses Clarke notation. I.e. the > wrapper shows namespace URIs directly as part of element names and such, > like this: > > {http://namespaces.somewhere.org/ns1}foo > > and prefixes are, for now, completely ignored as not relevant to the XML > infoset. Ah. > > Both have pros and cons. The relevant drawback in Libxml2 way > > is that it's hard, if even not possible, to implement a DOM wrapper > > which uses a programming language, where the time of destruction > > of an object lies not within the control of the programmer. > > Thanks, this is interesting as this is exactly what I'm trying to do > with lxml. Yeah, I read some of the message on your lxml list about your mechanism to keep detached nodes alive if they are referenced by multiple wrapper proxies. We took a sometimes memory-consuming but simple approach: we never free any removed Libxml2 nodes from the document, they are moved into an internal list of "garbage" nodes in the document wrapper and freed when the document is freed. A "flush" method can be used to cleanup such "garbage" nodes, if the user is sure that it's safe. An example (in Delphi code): (all vars are interfaces here, not objects) var doc: IDOMDocument; elem: IDOMElement; node: IDOMNode; begin elem := doc.documentElement; // Remove and put on garbage list. node := doc.documentElement.removeChild(elem); { Here @node will be freed by Delphi, but the Libxml2's node lives further. } // This would free elem's Libxml2-node. // doc.flushGarbage; // Attach do tree and remove from garbage list. doc.appendChild(elem); end; [...] > > This circumstance creates the following problem: > > If your remove a attribute-node, which is bound to a namespace, > > from it's parent, the attr->ns field still points to an elem->nsDef > > entry. This is OK, as long as this element-node is not itself > > freed - which would free the elem->nsDef entries as well. The > > destruction of this element would lead to attr->ns pointing to freed > > memory. > > Ugh. Luckily the ElementTree API doesn't allow the detaching of > attribute nodes from an element, but I can see how this would hurt any > W3C DOM implementation. For the ElementTree Libxml2's way seems to be safe enough. Good! > But now I wonder: does this only apply to attribute nodes, or also to > element nodes which are in a subtree? Testing this.. Ugh, yes, it does. > When I move a namespaced element (where the namespace is defined higher > in the tree) into another tree, and then subsequently remove the > original tree, things go way wrong and valgrind indeed points to a > reference to a libxml2 namespace structure that has since been removed. > Not good... > > But thanks for pointing this issue out to me! > > > There's no automatic mechanism to avoid this, since there is > > no reference counting involved. In C this should be user > > controllable: you just have to know what and when you are freeing > > something. Not in other programming languages like Python, Delphi, > > Java, etc. where the destruction time on objects is not always - if > > ever - predictable. > > Indeed. Python tends to be fairly predictable if its refcounting > algorithm is used, but that doesn't help any here, and that isn't > constant across Python implementations anyway. > > > Safe removal of nodes: > > So we obviously need a mechanism to let point the node->ns reference > > to an xmlNs entry which is not in danger of being freed unpredictably. > > A possible location would be an list of xmlNs entries, internally > > managed by the DOM document wrapper. > > Yes, in this case the problem would devolve to the issue I already have > with dictionaries, which is manageable as I can make this stuff globally > shared. Though, just as with dictionaries I hope that the adoptNode() > functionality could take care of this as well. > > I suspect that adoptNode() recreating namespaces wherever necessary in > the new document would indeed be sufficient to support Clarke notation > in ElementTree, even though the XML serialization would look ugly.. Am I > correct in that an adoptNode() would take care of this issue if prefixes > are hidden from the API user's view? Yes, in your case, if single attributes are not expected to be adopted, and potentially many auto-created namespace declarations don't bother you, the mechanism of xmlReconciliateNs seems best fitting: it just re-creates the missing declarations on the adopted element. OK, good to know that! Regards, Kasimier _______________________________________________ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml