On Friday, February 22, 2013 08:05:51 PM Daniel Veillard wrote:
> On Fri, Feb 22, 2013 at 12:11:33AM -0800, Alexey Neyman wrote:
> > [Adding back the list - I accidentally hit "reply" instead of "reply all"
> > in previous email]
> >
> > On Friday, February 22, 2013 01:38:51 PM Daniel Veillard wrote:
> > > [sending again since mail was apparently lost]
> > >
> > > On Thu, Feb 21, 2013 at 10:21:18AM -0800, Alexey Neyman wrote:
> > > > On Thursday, February 21, 2013 04:01:45 PM you wrote:
> > > > > On Wed, Feb 20, 2013 at 06:28:25PM -0800, Alexey Neyman wrote:
> > > > > > Hi libxml developers,
> > > > > >
> > > > > > [BTW, is this list alive?]
> > > > > >
> > > > > yes, why ?
> > > >
> > > > I've sent another patch 3 days ago, and there was no response at all.
> > > > Here
> > > > it is:
> > > >
> > > > https://mail.gnome.org/archives/xml/2013-February/msg00013.html
> > > >
> > > yep, I saw it, but it's more complex, I'm worried about what happens
> > >
> > > if there is some kind of deallocation like calling popInputCallbacks()
> > > from the python bindings themselves, while said input is still in use
> > > by a parser, I smell some reference counting would have to be added to
> > > take care of this, i.e. the single
> > >
> > > Py_INCREF(pythonInputOpenCallbackObject);
> > >
> > > on registration may not be sufficient to cover that case. The problem
> > > is that the new use is done directly from C and there is no mechanism we
> > > can use to increment/decrement the counter when a parser creates an
> > > input based on those callbacks.
> >
> > I see your point... If I understand it correctly, I need to increment the
> > refcount each time open callback succeeds (i.e. in
> > pythonInputOpenCallback(), when the return value is not Py_None) and
> > decrement it whenever the close callback is called (i.e., wrap
> > xmlPythonFileCloseRaw() into a new function, pythonInputCloseCallback()).
>
> yes, that's it basically, we need to catch new open and closes and
> adjust ref counts accordingly.
Actually, I reviewed it again and I don't think any additional refcounts are
necessary:
- Python method (registerInputCallback) passes a Python object,
findOpenCallback, to C code (to libxml2mod.xmlRegisterInputCallback)
- C code adds a reference to findOpenCallback
- Whenever an entity is loaded, Python object is called. It returns *another*
Python object (file, StringIO, whatever) that will be used as 'context' for
read/close callbacks, and PyObject_CallFunction creates a new reference on the
returned object.
- The read callback will use this newly returned Python object, and the close
callback will drop the reference created by PyObject_CallFunction - releasing
this object.
- When there is no more Python callbacks registered, popInputCallbacks() will
call libxml2mod.xmlUnregisterInputCallback() - which will drop the reference
on findOpenCallback.
So, even if popInputCallbacks() is called in the middle of reading an already
opened entity - it will only affect new attempts to load entity, but not
already opened ones - which use other objects as 'context'.
I did find a minor bug in my code, though, where I didn't drop a reference on
Py_None when PyObject_CallFunction returned it with a new reference. Since
Py_None is a singleton and always exists, this bug shouldn't have had any
runtime effect, though.
Updated patch, with a new test case, attached.
Also, I split the fixes to setEntityLoader() into a separate patch, also
attached.
Regards,
Alexey.
>
> > Am I missing anything?
>
> no, hopefully that will be sufficient :-)
>
> > > So i'm afraid we are stuck with a known
> > > problem we can't fix, unless we never release the python reference,
> > > which creates a leak (but a leak is better than a crash)
> > >
> > > in any case tests suite addition would be welcome, yes :-)
> >
> > Yes, I just wanted to first settle on the API for the binding :)
>
> Having a test which exercise the above refcount small problem would
> be a good thing too,
>
> thanks a lot !
>
> Daniel
diff --git a/python/generator.py b/python/generator.py
index 767c4bb..83d100c 100755
--- a/python/generator.py
+++ b/python/generator.py
@@ -339,6 +339,8 @@ def skip_function(name):
return 1
if name == "xmlValidateAttributeDecl":
return 1
+ if name == "xmlPopInputCallbacks":
+ return 1
return 0
diff --git a/python/libxml.c b/python/libxml.c
index 831b070..8c66fed 100644
--- a/python/libxml.c
+++ b/python/libxml.c
@@ -731,6 +731,94 @@ libxml_xmlSetEntityLoader(ATTRIBUTE_UNUSED PyObject *self, PyObject *args) {
return(py_retval);
}
+/************************************************************************
+ * *
+ * Input callback registration *
+ * *
+ ************************************************************************/
+static PyObject *pythonInputOpenCallbackObject;
+static int pythonInputCallbackID = -1;
+
+static int
+pythonInputMatchCallback(ATTRIBUTE_UNUSED const char *URI)
+{
+ /* Always return success, real decision whether URI is supported will be
+ * made in open callback. */
+ return 1;
+}
+
+static void *
+pythonInputOpenCallback(const char *URI)
+{
+ PyObject *ret;
+
+ ret = PyObject_CallFunction(pythonInputOpenCallbackObject,
+ (char *)"s", URI);
+ if (ret == Py_None) {
+ Py_DECREF(Py_None);
+ return NULL;
+ }
+ return ret;
+}
+
+PyObject *
+libxml_xmlRegisterInputCallback(ATTRIBUTE_UNUSED PyObject *self,
+ PyObject *args) {
+ PyObject *cb;
+
+ if (!PyArg_ParseTuple(args,
+ (const char *)"O:libxml_xmlRegisterInputCallback", &cb))
+ return(NULL);
+
+ if (!PyCallable_Check(cb)) {
+ PyErr_SetString(PyExc_ValueError, "input callback is not callable");
+ return(NULL);
+ }
+
+ /* Python module registers a single callback and manages the list of
+ * all callbacks internally. This is necessitated by xmlInputMatchCallback
+ * API, which does not allow for passing of data objects to discriminate
+ * different Python methods. */
+ if (pythonInputCallbackID == -1) {
+ pythonInputCallbackID = xmlRegisterInputCallbacks(
+ pythonInputMatchCallback, pythonInputOpenCallback,
+ xmlPythonFileReadRaw, xmlPythonFileCloseRaw);
+ if (pythonInputCallbackID == -1)
+ return PyErr_NoMemory();
+ pythonInputOpenCallbackObject = cb;
+ Py_INCREF(pythonInputOpenCallbackObject);
+ }
+
+ Py_INCREF(Py_None);
+ return(Py_None);
+}
+
+PyObject *
+libxml_xmlUnregisterInputCallback(ATTRIBUTE_UNUSED PyObject *self,
+ ATTRIBUTE_UNUSED PyObject *args) {
+ int ret;
+
+ ret = xmlPopInputCallbacks();
+ if (pythonInputCallbackID != -1) {
+ /* Assert that the right input callback was popped. libxml's API does not
+ * allow removal by ID, so all that could be done is an assert. */
+ if (pythonInputCallbackID == ret) {
+ pythonInputCallbackID = -1;
+ Py_DECREF(pythonInputOpenCallbackObject);
+ pythonInputOpenCallbackObject = NULL;
+ } else {
+ PyErr_SetString(PyExc_AssertionError, "popped non-python input callback");
+ return(NULL);
+ }
+ } else if (ret == -1) {
+ /* No more callbacks to pop */
+ PyErr_SetString(PyExc_IndexError, "no input callbacks to pop");
+ return(NULL);
+ }
+
+ Py_INCREF(Py_None);
+ return(Py_None);
+}
/************************************************************************
* *
@@ -3700,6 +3788,8 @@ static PyMethodDef libxmlMethods[] = {
{(char *) "getObjDesc", libxml_getObjDesc, METH_VARARGS, NULL},
{(char *) "compareNodesEqual", libxml_compareNodesEqual, METH_VARARGS, NULL},
{(char *) "nodeHash", libxml_nodeHash, METH_VARARGS, NULL},
+ {(char *) "xmlRegisterInputCallback", libxml_xmlRegisterInputCallback, METH_VARARGS, NULL},
+ {(char *) "xmlUnregisterInputCallback", libxml_xmlUnregisterInputCallback, METH_VARARGS, NULL},
{NULL, NULL, 0, NULL}
};
diff --git a/python/libxml.py b/python/libxml.py
index 013d65c..193f97a 100644
--- a/python/libxml.py
+++ b/python/libxml.py
@@ -719,11 +719,35 @@ class xmlTextReaderCore:
return arg
#
-# The cleanup now goes though a wrappe in libxml.c
+# The cleanup now goes though a wrapper in libxml.c
#
def cleanupParser():
libxml2mod.xmlPythonCleanupParser()
+#
+# The interface to xmlRegisterInputCallbacks.
+# Since this API does not allow to pass a data object along with
+# match/open callbacks, it is necessary to maintain a list of all
+# Python callbacks.
+#
+__input_callbacks = []
+def registerInputCallback(func):
+ def findOpenCallback(URI):
+ for cb in reversed(__input_callbacks):
+ o = cb(URI)
+ if o is not None:
+ return o
+ libxml2mod.xmlRegisterInputCallback(findOpenCallback)
+ __input_callbacks.append(func)
+
+def popInputCallbacks():
+ # First pop python-level callbacks, when no more available - start
+ # popping built-in ones.
+ if len(__input_callbacks) > 0:
+ __input_callbacks.pop()
+ if len(__input_callbacks) == 0:
+ libxml2mod.xmlUnregisterInputCallback()
+
# WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING
#
# Everything before this line comes from libxml.py
diff --git a/python/libxml_wrap.h b/python/libxml_wrap.h
index eaa5e96..ac5a626 100644
--- a/python/libxml_wrap.h
+++ b/python/libxml_wrap.h
@@ -247,3 +247,5 @@ PyObject * libxml_xmlSchemaValidCtxtPtrWrap(xmlSchemaValidCtxtPtr valid);
#endif /* LIBXML_SCHEMAS_ENABLED */
PyObject * libxml_xmlErrorPtrWrap(xmlErrorPtr error);
PyObject * libxml_xmlSchemaSetValidErrors(PyObject * self, PyObject * args);
+PyObject * libxml_xmlRegisterInputCallback(PyObject *self, PyObject *args);
+PyObject * libxml_xmlUnregisterInputCallback(PyObject *self, PyObject *args);
diff --git a/python/tests/Makefile.am b/python/tests/Makefile.am
index ab079bb..95ebead 100644
--- a/python/tests/Makefile.am
+++ b/python/tests/Makefile.am
@@ -19,6 +19,7 @@ PYTESTS= \
xpath.py \
outbuf.py \
inbuf.py \
+ input_callback.py \
resolver.py \
regexp.py \
reader.py \
diff --git a/python/tests/input_callback.py b/python/tests/input_callback.py
new file mode 100644
index 0000000..982f940
--- /dev/null
+++ b/python/tests/input_callback.py
@@ -0,0 +1,128 @@
+#!/usr/bin/python -u
+#
+# This tests custom input callbacks
+#
+import sys
+import StringIO
+import libxml2
+
+# We implement a new scheme, py://strings/ that will reference this dictionary
+pystrings = {
+ 'catalogs/catalog.xml' :
+'''<?xml version="1.0" encoding="utf-8"?>
+<!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN" "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
+<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
+ <rewriteSystem systemIdStartString="http://example.com/dtds/" rewritePrefix="../dtds/"/>
+</catalog>''',
+
+ 'xml/sample.xml' :
+'''<?xml version="1.0" encoding="utf-8"?>
+<!DOCTYPE root SYSTEM "http://example.com/dtds/sample.dtd">
+<root>&sample.entity;</root>''',
+
+ 'dtds/sample.dtd' :
+'''
+<!ELEMENT root (#PCDATA)>
+<!ENTITY sample.entity "replacement text">'''
+}
+
+def verify_doc(doc):
+ e = doc.getRootElement()
+ if e.name != 'root':
+ raise ValueError("name")
+ if e.content != 'replacement text':
+ raise ValueError("content")
+
+prefix = "py://strings/"
+def my_input_cb(URI):
+ idx = URI.startswith(prefix)
+ if idx == -1:
+ return None
+ path = URI[len(prefix):]
+ if path not in pystrings:
+ print "my_input_cb: path does not exist, '%s'" % path
+ return None
+ print "my_input_cb: loading '%s'" % URI
+ return StringIO.StringIO(pystrings[path])
+
+opts = libxml2.XML_PARSE_DTDLOAD | libxml2.XML_PARSE_NONET | libxml2.XML_PARSE_COMPACT
+startURL = prefix + "xml/sample.xml"
+catURL = prefix + "catalogs/catalog.xml"
+
+# Check that we cannot read custom schema without custom callback
+print
+print "Test 1: Expecting failure to load (custom scheme not handled)"
+try:
+ doc = libxml2.readFile(startURL, None, opts)
+ print "Read custom scheme without registering handler succeeded?"
+ sys.exit(1)
+except libxml2.treeError, e:
+ pass
+
+# Register handler and try to load the same entity
+print
+print "Test 2: Expecting failure to load (no catalog - cannot load DTD)"
+libxml2.registerInputCallback(my_input_cb)
+doc = libxml2.readFile(startURL, None, opts)
+try:
+ verify_doc(doc)
+ print "Doc was loaded?"
+except ValueError, e:
+ if str(e) != "content":
+ print "Doc verify failed"
+doc.freeDoc()
+
+# Register a catalog (also accessible via pystr://) and retry
+print
+print "Test 3: Expecting successful loading"
+parser = libxml2.createURLParserCtxt(startURL, opts)
+parser.addLocalCatalog(catURL)
+parser.parseDocument()
+doc = parser.doc()
+verify_doc(doc)
+doc.freeDoc()
+
+# Unregister custom callback when parser is already created
+print
+print "Test 4: Expect failure to read (custom callback unregistered during read)"
+parser = libxml2.createURLParserCtxt(startURL, opts)
+libxml2.popInputCallbacks()
+parser.addLocalCatalog(catURL)
+parser.parseDocument()
+doc = parser.doc()
+try:
+ verify_doc(doc)
+ print "Doc was loaded?"
+except ValueError, e:
+ if str(e) != "content":
+ print "Doc verify failed"
+doc.freeDoc()
+
+# Try to load the document again
+print
+print "Test 5: Expect failure to load (callback unregistered)"
+try:
+ doc = libxml2.readFile(startURL, None, opts)
+ print "Read custom scheme without registering handler succeeded?"
+ sys.exit(1)
+except libxml2.treeError, e:
+ pass
+
+# But should be able to read standard I/O yet...
+print
+print "Test 6: Expect successful loading using standard I/O"
+doc = libxml2.readFile("tst.xml", None, opts)
+doc.freeDoc()
+
+# Now pop ALL input callbacks, should fail to load even standard I/O
+print
+print "Test 7: Remove all input callbacks, expect failure to load using standard I/O"
+try:
+ while True:
+ libxml2.popInputCallbacks()
+except IndexError, e:
+ print "Popped all input callbacks: " + str(e)
+try:
+ doc = libxml2.readFile("tst.xml", None, opts)
+except libxml2.treeError, e:
+ pass
diff --git a/python/libxml.c b/python/libxml.c
index a556160..831b070 100644
--- a/python/libxml.c
+++ b/python/libxml.c
@@ -665,7 +665,7 @@ pythonExternalEntityLoader(const char *URL, const char *ID,
Py_XDECREF(ctxtobj);
#ifdef DEBUG_LOADER
printf("pythonExternalEntityLoader: result ");
- PyObject_Print(ret, stderr, 0);
+ PyObject_Print(ret, stdout, 0);
printf("\n");
#endif
@@ -711,13 +711,20 @@ libxml_xmlSetEntityLoader(ATTRIBUTE_UNUSED PyObject *self, PyObject *args) {
&loader))
return(NULL);
+ if (!PyCallable_Check(loader)) {
+ PyErr_SetString(PyExc_ValueError, "entity loader is not callable");
+ return(NULL);
+ }
+
#ifdef DEBUG_LOADER
printf("libxml_xmlSetEntityLoader\n");
#endif
if (defaultExternalEntityLoader == NULL)
defaultExternalEntityLoader = xmlGetExternalEntityLoader();
+ Py_XDECREF(pythonExternalEntityLoaderObjext);
pythonExternalEntityLoaderObjext = loader;
+ Py_XINCREF(pythonExternalEntityLoaderObjext);
xmlSetExternalEntityLoader(pythonExternalEntityLoader);
py_retval = PyInt_FromLong(0);
_______________________________________________
xml mailing list, project page http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml