Re: [xml] redicting parts of trees

Kasimier Buchcik Thu, 19 May 2005 12:01:41 -0700

Hi,

On Thu, 2005-05-19 at 20:19 +0200, Martijn Faassen wrote:
> Kasimier Buchcik wrote:
> > Hi,
> > 
> > On Thu, 2005-05-19 at 17:16 +0200, Martijn Faassen wrote:
> > 
> >>Kasimier Buchcik wrote:


[...]

> >>Anyway, anything I can do now to help? I will of course be testing this 
> >>facility at some stage within lxml, and give feedback then if necessary.
> > 
> > 
> > You could describe how you intend to manage namespaces in your
> > wrapper. Will you try to go W3C way or Libxml2 namespace way?
> 
> I'm following the ElementTree way, which uses Clarke notation. I.e. the 
> wrapper shows namespace URIs directly as part of element names and such, 
> like this:
> 
> {http://namespaces.somewhere.org/ns1}foo
> 
> and prefixes are, for now, completely ignored as not relevant to the XML 
> infoset.

Ah.

> > Both have pros and cons. The relevant drawback in Libxml2 way
> > is that it's hard, if even not possible, to implement a DOM wrapper
> > which uses a programming language, where the time of destruction
> > of an object lies not within the control of the programmer.
> 
> Thanks, this is interesting as this is exactly what I'm trying to do 
> with lxml.

Yeah, I read some of the message on your lxml list about your mechanism
to keep detached nodes alive if they are referenced by multiple wrapper
proxies. We took a sometimes memory-consuming but simple approach: we
never free any removed Libxml2 nodes from the document, they are moved
into an internal list of "garbage" nodes in the document wrapper and
freed when the document is freed. A "flush" method can be used to
cleanup such "garbage" nodes, if the user is sure that it's safe.

An example (in Delphi code):
(all vars are interfaces here, not objects)
var
  doc: IDOMDocument;
  elem: IDOMElement;
  node: IDOMNode;
begin
  elem := doc.documentElement;

  // Remove and put on garbage list.
  node := doc.documentElement.removeChild(elem); 
  { Here @node will be freed by Delphi, but the Libxml2's node
    lives further. }

  // This would free elem's Libxml2-node.
  // doc.flushGarbage;

  // Attach do tree and remove from garbage list. 
  doc.appendChild(elem); 
end;

[...]

> > This circumstance creates the following problem:
> > If your remove a attribute-node, which is bound to a namespace,
> > from it's parent, the attr->ns field still points to an elem->nsDef
> > entry. This is OK, as long as this element-node is not itself
> > freed - which would free the elem->nsDef entries as well. The
> > destruction of this element would lead to attr->ns pointing to freed
> > memory. 
> 
> Ugh.  Luckily the ElementTree API doesn't allow the detaching of 
> attribute nodes from an element, but I can see how this would hurt any 
> W3C DOM implementation.

For the ElementTree Libxml2's way seems to be safe enough. Good!

> But now I wonder: does this only apply to attribute nodes, or also to 
> element nodes which are in a subtree? Testing this.. Ugh, yes, it does. 
> When I move a namespaced element (where the namespace is defined higher 
> in the tree) into another tree, and then subsequently remove the 
> original tree, things go way wrong and valgrind indeed points to a 
> reference to a libxml2 namespace structure that has since been removed. 
> Not good...
> 
> But thanks for pointing this issue out to me!
> 
> > There's no automatic mechanism to avoid this, since there is
> > no reference counting involved. In C this should be user
> > controllable: you just have to know what and when you are freeing
> > something. Not in other programming languages like Python, Delphi,
> > Java, etc. where the destruction time on objects is not always - if
> > ever - predictable.
> 
> Indeed. Python tends to be fairly predictable if its refcounting 
> algorithm is used, but that doesn't help any here, and that isn't 
> constant across Python implementations anyway.
> 
> > Safe removal of nodes:
> > So we obviously need a mechanism to let point the node->ns reference
> > to an xmlNs entry which is not in danger of being freed unpredictably.
> > A possible location would be an list of xmlNs entries, internally
> > managed by the DOM document wrapper.
> 
> Yes, in this case the problem would devolve to the issue I already have
> with dictionaries, which is manageable as I can make this stuff globally 
> shared. Though, just as with dictionaries I hope that the adoptNode() 
> functionality could take care of this as well.
> 
> I suspect that adoptNode() recreating namespaces wherever necessary in 
> the new document would indeed be sufficient to support Clarke notation 
> in ElementTree, even though the XML serialization would look ugly.. Am I 
> correct in that an adoptNode() would take care of this issue if prefixes 
> are hidden from the API user's view?

Yes, in your case, if single attributes are not expected to be adopted,
and potentially many auto-created namespace declarations don't bother
you, the mechanism of xmlReconciliateNs seems best fitting: it just
re-creates the missing declarations on the adopted element. OK, good to
know that!

Regards,

Kasimier
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml

Re: [xml] redicting parts of trees

Reply via email to