[sorry if this message appears twice on the list - I wasn't subscribed when I 
sent the first one, so the message apparently didn't get through]

Hi libxml2 developers,

I am implementing a simple XSLT processor that would operate on the data right 
out of Subversion repository. That is, it needs to handle svn:// URIs. I am 
implementing it in Python.

For that purpose, the libxml2.setEntityLoader almost works. The problem is 
that the catalog itself is not loaded using the external entity loader - thus 
bypassing the supplied handler for the svn:// URIs. If I use a catalog on a 
local filesystem, the locations are resolved relative to that catalog's 
location (as per specification). If I use a catalog on a local filesystem 
which rewrites public/system IDs into svn:// URIs, these URIs go directly into 
xmlIO layer - again, bypassing entity loader - and xmlIO cannot handle svn:// 
URIs (only http:// and ftp://).

It is possible to make xmlIO handle any protocol by means of 
xmlRegisterInputCallback(). However, that function is currently only available 
in C API. So, the natural solution seems to be implementing Python bindings 
for the xmlRegisterInputCallback. The attached patch implements such bindings.

If there are no objections to the patch, I could augment it with test cases 
for these bindings.

Also, attached patch fixes a few problems with setEntityLoader:

1. Setting entity loader does not increment the refcount on the Python object 
passed in. This works only if the object is not deleted. For example, the 
following code results in segmentation fault in Python interpreter when 
attempting to process any document:

[[[ 
def register_entity_loader():
    def entity_loader(URL, ID, ctxt):
        ...
    libxml2.setEntityLoader(entity_loader

register_entity_loader()
]]]

2. setEntityLoader() does not verify if the passed object is callable. If it 
is not, current implementation attempts to call it anyway and failing that, 
silently moves on to default entity loader. Attached patch makes 
setEntityLoader raise ValueError exception if non-callable object is passed.

3. In debug mode, pythonExternalEntityLoader() outputs the result object to 
stderr, while the messages before and after the object (description + newline) 
go to stdout. Attached patch makes them all go to stdout.

Regards,
Alexey.
diff --git a/python/generator.py b/python/generator.py
index 767c4bb..83d100c 100755
--- a/python/generator.py
+++ b/python/generator.py
@@ -339,6 +339,8 @@ def skip_function(name):
         return 1
     if name == "xmlValidateAttributeDecl":
         return 1
+    if name == "xmlPopInputCallbacks":
+	return 1
 
     return 0
 
diff --git a/python/libxml.c b/python/libxml.c
index a556160..7522cea 100644
--- a/python/libxml.c
+++ b/python/libxml.c
@@ -665,7 +665,7 @@ pythonExternalEntityLoader(const char *URL, const char *ID,
 	Py_XDECREF(ctxtobj);
 #ifdef DEBUG_LOADER
 	printf("pythonExternalEntityLoader: result ");
-	PyObject_Print(ret, stderr, 0);
+	PyObject_Print(ret, stdout, 0);
 	printf("\n");
 #endif
 
@@ -711,19 +711,110 @@ libxml_xmlSetEntityLoader(ATTRIBUTE_UNUSED PyObject *self, PyObject *args) {
 		&loader))
 	return(NULL);
 
+    if (!PyCallable_Check(loader)) {
+	PyErr_SetString(PyExc_ValueError, "entity loader is not callable");
+	return(NULL);
+    }
+
 #ifdef DEBUG_LOADER
     printf("libxml_xmlSetEntityLoader\n");
 #endif
     if (defaultExternalEntityLoader == NULL) 
 	defaultExternalEntityLoader = xmlGetExternalEntityLoader();
 
+    Py_XDECREF(pythonExternalEntityLoaderObjext);
     pythonExternalEntityLoaderObjext = loader;
+    Py_XINCREF(pythonExternalEntityLoaderObjext);
     xmlSetExternalEntityLoader(pythonExternalEntityLoader);
 
     py_retval = PyInt_FromLong(0);
     return(py_retval);
 }
 
+/************************************************************************
+ *									*
+ *		Input callback registration				*
+ *									*
+ ************************************************************************/
+static PyObject *pythonInputOpenCallbackObject;
+static int pythonInputCallbackID = -1;
+
+static int
+pythonInputMatchCallback(ATTRIBUTE_UNUSED const char *URI)
+{
+    /* Always return success, real decision whether URI is supported will be
+     * made in open callback.  */
+    return 1;
+}
+
+static void *
+pythonInputOpenCallback(const char *URI)
+{
+    PyObject *ret;
+
+    ret = PyObject_CallFunction(pythonInputOpenCallbackObject,
+	    (char *)"s", URI);
+    return ret == Py_None ? NULL : ret;
+}
+
+PyObject *
+libxml_xmlRegisterInputCallback(ATTRIBUTE_UNUSED PyObject *self,
+                                PyObject *args) {
+    PyObject *cb;
+
+    if (!PyArg_ParseTuple(args,
+		(const char *)"O:libxml_xmlRegisterInputCallback", &cb))
+	return(NULL);
+
+    if (!PyCallable_Check(cb)) {
+	PyErr_SetString(PyExc_ValueError, "input callback is not callable");
+	return(NULL);
+    }
+
+    /* Python module registers a single callback and manages the list of
+     * all callbacks internally. This is necessitated by xmlInputMatchCallback
+     * API, which does not allow for passing of data objects to discriminate
+     * different Python methods.  */
+    if (pythonInputCallbackID == -1) {
+	pythonInputCallbackID = xmlRegisterInputCallbacks(
+		pythonInputMatchCallback, pythonInputOpenCallback,
+		xmlPythonFileReadRaw, xmlPythonFileCloseRaw);
+	if (pythonInputCallbackID == -1)
+	    return PyErr_NoMemory();
+	pythonInputOpenCallbackObject = cb;
+	Py_INCREF(pythonInputOpenCallbackObject);
+    }
+
+    Py_INCREF(Py_None);
+    return(Py_None);
+}
+
+PyObject *
+libxml_xmlUnregisterInputCallback(ATTRIBUTE_UNUSED PyObject *self,
+                                ATTRIBUTE_UNUSED PyObject *args) {
+    int ret;
+
+    ret = xmlPopInputCallbacks();
+    if (pythonInputCallbackID != -1) {
+	/* Assert that the right input callback was popped. libxml's API does not
+	 * allow removal by ID, so all that could be done is an assert.  */
+	if (pythonInputCallbackID == ret) {
+	    pythonInputCallbackID = -1;
+	    Py_DECREF(pythonInputOpenCallbackObject);
+	    pythonInputOpenCallbackObject = NULL;
+	} else {
+	    PyErr_SetString(PyExc_AssertionError, "popped non-python input callback");
+	    return(NULL);
+	}
+    } else if (ret == -1) {
+	/* No more callbacks to pop */
+	PyErr_SetString(PyExc_IndexError, "no input callbacks to pop");
+	return(NULL);
+    }
+
+    Py_INCREF(Py_None);
+    return(Py_None);
+}
 
 /************************************************************************
  *									*
@@ -3693,6 +3784,8 @@ static PyMethodDef libxmlMethods[] = {
     {(char *) "getObjDesc", libxml_getObjDesc, METH_VARARGS, NULL},
     {(char *) "compareNodesEqual", libxml_compareNodesEqual, METH_VARARGS, NULL},
     {(char *) "nodeHash", libxml_nodeHash, METH_VARARGS, NULL},
+    {(char *) "xmlRegisterInputCallback", libxml_xmlRegisterInputCallback, METH_VARARGS, NULL},
+    {(char *) "xmlUnregisterInputCallback", libxml_xmlUnregisterInputCallback, METH_VARARGS, NULL},
     {NULL, NULL, 0, NULL}
 };
 
diff --git a/python/libxml.py b/python/libxml.py
index c861a70..45c8223 100644
--- a/python/libxml.py
+++ b/python/libxml.py
@@ -719,11 +719,35 @@ class xmlTextReaderCore:
             return arg
 
 #
-# The cleanup now goes though a wrappe in libxml.c
+# The cleanup now goes though a wrapper in libxml.c
 #
 def cleanupParser():
     libxml2mod.xmlPythonCleanupParser()
 
+#
+# The interface to xmlRegisterInputCallbacks.
+# Since this API does not allow to pass a data object along with
+# match/open callbacks, it is necessary to maintain a list of all
+# Python callbacks.
+#
+__input_callbacks = []
+def registerInputCallback(func):
+    def findOpenCallback(URI):
+        for cb in reversed(__input_callbacks):
+            o = cb(URI)
+            if o is not None:
+                return o
+    libxml2mod.xmlRegisterInputCallback(findOpenCallback)
+    __input_callbacks.append(func)
+
+def popInputCallbacks():
+    # First pop python-level callbacks, when no more available - start
+    # popping built-in ones.
+    if len(__input_callbacks) > 0:
+        __input_callbacks.pop()
+    if len(__input_callbacks) == 0:
+        libxml2mod.xmlUnregisterInputCallback()
+
 # WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING
 #
 # Everything before this line comes from libxml.py 
diff --git a/python/libxml_wrap.h b/python/libxml_wrap.h
index eaa5e96..ac5a626 100644
--- a/python/libxml_wrap.h
+++ b/python/libxml_wrap.h
@@ -247,3 +247,5 @@ PyObject * libxml_xmlSchemaValidCtxtPtrWrap(xmlSchemaValidCtxtPtr valid);
 #endif /* LIBXML_SCHEMAS_ENABLED */
 PyObject * libxml_xmlErrorPtrWrap(xmlErrorPtr error);
 PyObject * libxml_xmlSchemaSetValidErrors(PyObject * self, PyObject * args);
+PyObject * libxml_xmlRegisterInputCallback(PyObject *self, PyObject *args);
+PyObject * libxml_xmlUnregisterInputCallback(PyObject *self, PyObject *args);
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml

Reply via email to