On Tue, 2005-08-23 at 12:49 -0400, Gary Poster wrote: > Michel (and anyone else with experience with RDFLib on the list), I > recently looked at RDFLib (http://rdflib.net/) and came away (after > an hour or so) with a good first impression.
Great. I've cc:ed Dan Krech, the lead rdflib developer on this mail. For his benefit I might explain things that you obviously know. > My biggest disappointment was that, from the perspective of a Zope 3 > developer, using it alongside other Zope 3 indexes (and other intid- > based data structures) meant that I would have to externally convert > to and from RDF in order to merge results and convert the RDF URIs to > objects. Correct. A specific and important optimization in Zope-style cataloging is that objects have a cheap unique integer to reduce catalog footprint and significantly improve result merging and joining. These intergers are exposed as a utility component in Zope. > It would be much more efficient if I could have an RDF > resource class that represented an intid, and even more efficient if > I could get IFBTrees back directly from searches that somehow > included the intids. Yes, this is a problem that needs to be solved, and your suggestion is one way to solve it. I've discussed this a few time with Florent at the paris and EUpy sprints and he had a similar suggestion. I'm uncomfortable with it for a few reasons, 1) because intids are such a Zope-catalog-optimization specific thing. I know why they are exposed, so that catalog results can be efficiently merged, but they don't have anything to do with RDF, so 2) rdflib can't really change its interface to accomodate them. Also, 3) they are backend specific, for example rdflib has a URI -> integer mapping for its in-meomory and ZODB backends to reduce footprint, but a sql backend would need no such integer, you would in fact have to *add* a column to hold that value just so the data would merge efficiently with a catalog. This seems antithetical to Zope 3's philosophy in general as it violates the concept of not requiring third party libs and data to change themselves significantly just to work with Zope. Of course, this isn't a problem of the catalog, it's a problem in general merging search results from anywhere. I'd like to make the optimization available so that searches on a graph can be efficiently merged with searches on a catalog, but I don't think it can be done by pushing intids down into rdflib, or for that matter any other third party component you want to play with the catalog efficiently. Perhaps instead of pushing the integers down we could push URIs up, Zope's cataloging could grown another layer of indirection on top of intids and provide a URI utility that maps to intids. Of course you might object to that for the same reasons I'm objecting to this. ;) But at least URIs are a well known standard. Somewhat at right angles to this, I think Zope needs to grow another search interface, a higher level one that hides all of this integer id stuff from the user. I proposed something incomplete along these lines to the z3labs site, an interface that could aggregate searches across multiple registered search sources, whether catalogs, rdflib Graphs, relational databases, remote systems, google, etc. With something like this, no need to worry about intersecting two floating point result sets efficiently, the underlying search framework performs that optimization if it is available. Note that the primary benefit of such an interface is not necessarily merging results across multiple sources, but instead providing a consistent interface regardless of the search source. > Then I could leverage the relationship and > keyword capabilities of RDFLib while also merging results efficiently > with other index-like data structures in Zope 3. The intid-specific > resources could even have stable URI representations without too much > trouble, so that they could be exported and imported with RDFLib, if > desired. Hmm so these resource objects you are suggesting, they would be persistent objects? I don't quite have the picture of what you suggest. Perhaps these resource classes can be managed by a utility? > Have you thought about that use case? If one used a variation of > your back end that assigned intids to non-intid-based resources like > URIs and Literals and stored the relationships via intids, One doesn't need a variation, this is exactly the way the in-memory and ZODB backends work now as an optimization. But they are internal details of the implementation of those backends. > you could > store the data as IFBTrees and offer up an API to get "raw" IFBTree > results. Any obvious ways that would be a problem? Does it feel > reasonable to you? Any suggestions? Well not any good ones yet, although I know it's an important problem. I'll have to think about it a bit more. Do you understand my objections? Does anyone else have any suggestions out there? This is probably worth solving in the general case, since it's going to come up anytime you're going to want to merge catalog results with anything. > I'm generally interested in RDFLib, your use of it, and your hopes > for it, if you feel like holding forth. :-) Great! And I didn't even have to feed you any kool aid or buy you a bottle of aquavit. ;) -Michel _______________________________________________ Zope3-dev mailing list Zope3-dev@zope.org Unsub: http://mail.zope.org/mailman/options/zope3-dev/archive%40mail-archive.com