[Zope3-dev] KeywordIndex, TopicIndex should implement IIndexSearch
I've just discovered that KeywordIndex and TopicIndex don't implement IIndexSearch so the indexes don't work with the Catalog. Is there any reasons for this? I think "apply" methods in this case can be equivalent to "search(query, 'and')" without any problems. -- Dmitry Vasiliev http://hlabs.spb.ru ___ Zope3-dev mailing list Zope3-dev@zope.org Unsub: http://mail.zope.org/mailman/options/zope3-dev/archive%40mail-archive.com
Re: [Zope3-dev] KeywordIndex
On Jul 22, 2005, at 2:30 PM, Michel Pelletier wrote: Awesome! I like the idea of set indexes and always looking for new indexing technique. Only having looked breifly at the code myself, what is the relation between zc.catalog and zope.app.catalog? The stuff in zc.catalog covers code that would live in both zope.index and zope.app.catalog. The current plan is to divide the code into those two packages once Zope 3.1 is released. Specifically, the extent catalog extends the capabilities of zope.app.catalog and might go there; the indexes in index.py might go in zope.index; the subclasses in catalogindex might go in zope.app.catalog; the globber will disappear, never to be seen again; and the other stuff I'm not sure about. :-) Again, the sandbox README touched on this and described the rest of the code; I think it's a decent intro. http://svn.zope.org/Sandbox/zc/README.txt?rev=37377&view=auto Gary ___ Zope3-dev mailing list Zope3-dev@zope.org Unsub: http://mail.zope.org/mailman/options/zope3-dev/archive%40mail-archive.com
Re: [Zope3-dev] KeywordIndex
On Thu, 2005-07-21 at 22:49 -0400, Gary Poster wrote: > On Jul 19, 2005, at 1:06 AM, Jeff Shell wrote: > > > Hi Gary! I'd be very interested in this. It's not critical for me > > right now, so there's no need to rush making something available. I > > have an inefficient but fun solution for my system that can be > > replaced when this comes along. I primarily wanted to know the state > > of the indexes. > > Hey again Jeff (and Michel, since you said you were interested > too :-) ). I got snapshots of the catalog stuff up tonight (along > with 'listcontainer', which is not as general interest) on zope.org > as ZPL. Here's the README: > > http://svn.zope.org/Sandbox/zc/README.txt?rev=37377&view=auto > > And, of course, check them out from svn.zope.org/repos/main/Sandbox/ > zc/catalog and svn.zope.org/repos/main/Sandbox/zc/listcontainer, > respectively. Awesome! I like the idea of set indexes and always looking for new indexing technique. Only having looked breifly at the code myself, what is the relation between zc.catalog and zope.app.catalog? -Michel ___ Zope3-dev mailing list Zope3-dev@zope.org Unsub: http://mail.zope.org/mailman/options/zope3-dev/archive%40mail-archive.com
Re: [Zope3-dev] KeywordIndex
On Jul 19, 2005, at 1:06 AM, Jeff Shell wrote: Hi Gary! I'd be very interested in this. It's not critical for me right now, so there's no need to rush making something available. I have an inefficient but fun solution for my system that can be replaced when this comes along. I primarily wanted to know the state of the indexes. Hey again Jeff (and Michel, since you said you were interested too :-) ). I got snapshots of the catalog stuff up tonight (along with 'listcontainer', which is not as general interest) on zope.org as ZPL. Here's the README: http://svn.zope.org/Sandbox/zc/README.txt?rev=37377&view=auto And, of course, check them out from svn.zope.org/repos/main/Sandbox/ zc/catalog and svn.zope.org/repos/main/Sandbox/zc/listcontainer, respectively. Gary ___ Zope3-dev mailing list Zope3-dev@zope.org Unsub: http://mail.zope.org/mailman/options/zope3-dev/archive%40mail-archive.com
Re: [Zope3-dev] KeywordIndex
On Jul 19, 2005, at 1:06 AM, Jeff Shell wrote: Hi Gary! I'd be very interested in this. It's not critical for me right now, so there's no need to rush making something available. I have an inefficient but fun solution for my system that can be replaced when this comes along. I primarily wanted to know the state of the indexes. OK, cool. I'll try to get it up in the next few days (maybe even today). Is what's there right now going to be what ships with Zope 3.1 final? To my knowledge and understanding, yes: 3.1 is feature frozen, and the code I wrote about is all about new features. I'm tempted to try putting this out as another package, the way zope.formlib was done: that way you could install it in 3.1 if you desired. The (possibly) big downside is that then we have an unnecessary division of the code in Zope 3 proper (zope.index and zope.app.catalog) versus this new stuff. Dunno. Gary ___ Zope3-dev mailing list Zope3-dev@zope.org Unsub: http://mail.zope.org/mailman/options/zope3-dev/archive%40mail-archive.com
Re: [Zope3-dev] KeywordIndex
Hi Gary! I'd be very interested in this. It's not critical for me right now, so there's no need to rush making something available. I have an inefficient but fun solution for my system that can be replaced when this comes along. I primarily wanted to know the state of the indexes. Is what's there right now going to be what ships with Zope 3.1 final? On 7/18/05, Gary Poster <[EMAIL PROTECTED]> wrote: > > On Jul 18, 2005, at 11:14 AM, Jeff Shell wrote: > > > I'm working on a simple application which is the first time I get to > > use the catalog in Zope 3. I'm writing against Zope 3.1b1. I was > > dismayed not to see KeywordIndex in the main catalog set, but then I > > found it in zope.index.keyword. But it seems to be a bit behind. > > Hi. Yes, we needed it too. Here's another thing we want to open > source. Look at the attached .txt file; if you want it then tell me > and I'll make it available in a sandbox. We'll move it over into the > Zope repo (probably with a new name, or rearranged on the appropriate > locations (zope.index and zope.app.catalog, etc.) RSN. > > Downsides: > > - Note that some functionality requires that you use an extent > catalog, another goodie in the package. > > - We have some refactoring of this that we want to do. We'll have > legacy issues ourselves, then. > > Additional upside: > > - This package also includes a replacement for the field index > (called a value index) and customizations of the value and set > indexes specific to timezone-aware datetimes, as well as a few other > things. ___ Zope3-dev mailing list Zope3-dev@zope.org Unsub: http://mail.zope.org/mailman/options/zope3-dev/archive%40mail-archive.com
Re: [Zope3-dev] KeywordIndex
On Mon, 2005-07-18 at 12:00 -0400, [EMAIL PROTECTED] wrote: > Date: Mon, 18 Jul 2005 09:14:16 -0600 > From: Jeff Shell <[EMAIL PROTECTED]> > Subject: [Zope3-dev] KeywordIndex > To: zope3-dev@zope.org > Message-ID: <[EMAIL PROTECTED]> > Content-Type: text/plain; charset=ISO-8859-1 > > I'm working on a simple application which is the first time I get to > use the catalog in Zope 3. I'm writing against Zope 3.1b1. I was > dismayed not to see KeywordIndex in the main catalog set, but then I > found it in zope.index.keyword. But it seems to be a bit behind. I > have it somewhat working through subclassing, etc, but it's been > purely guess work on my part to get things this far. In my product > package, I have the following: 1. I'm unable to help you directly with your problem, although Gary's post about the SetIndex looks very promising and I would like to see that code as well. As you said, something like a keyword index is exactly what your application is designed around, but if I could digress from the topic a little I'd like to suggest another solution, rdflib. rdflib covers some of the same use cases as a keyword index. Your objects (content, whatever) and your keywords would be assigned unique identifiers. You then add relations to an rdflib.Graph that associate the keyword with your objects: >>> dc = rdflib.Namespace('http://purl.org/dc/elements/1.1/') >>> blue = rdflib.BNode() # creates a URI for you >>> g = rdflib.Graph() >>> g.add((object_uri, dc.keywords, blue)) and then query the data back out with either a low-level g.triples((s, p, o)) pattern or a sparql query. For example, print a list of all object URIs that have the blue_uri keyword: >>> print [s for (s, p, o) in g.triples((rdflib.Any, dc.keywords, blue))] or sparql: >>> sg = rdflib.sparqlGraph(g) >>> select = ("?object_uri",) >>> where = rdflib.GraphPattern([("?object_uri", dc.keywords, blue)]) >>> for object_uri in sg.query(select, where): print object_uri is the same query, but longer. With sparql you can do more complex sql like queries however against other relations. Hope this helps, -Michel ___ Zope3-dev mailing list Zope3-dev@zope.org Unsub: http://mail.zope.org/mailman/options/zope3-dev/archive%40mail-archive.com
Re: [Zope3-dev] KeywordIndex
On Jul 18, 2005, at 11:14 AM, Jeff Shell wrote: I'm working on a simple application which is the first time I get to use the catalog in Zope 3. I'm writing against Zope 3.1b1. I was dismayed not to see KeywordIndex in the main catalog set, but then I found it in zope.index.keyword. But it seems to be a bit behind. Hi. Yes, we needed it too. Here's another thing we want to open source. Look at the attached .txt file; if you want it then tell me and I'll make it available in a sandbox. We'll move it over into the Zope repo (probably with a new name, or rearranged on the appropriate locations (zope.index and zope.app.catalog, etc.) RSN. Downsides: - Note that some functionality requires that you use an extent catalog, another goodie in the package. - We have some refactoring of this that we want to do. We'll have legacy issues ourselves, then. Additional upside: - This package also includes a replacement for the field index (called a value index) and customizations of the value and set indexes specific to timezone-aware datetimes, as well as a few other things. Gary The setindex is an index similar to, but more general than a traditional keyword index. The values indexed are expected to be iterables; the index allows searches for documents that contain any of a set of values; all of a set of values; or between a set of values. Additionally, the index supports an interface that allows examination of the indexed values. It is as policy-free as possible, and is intended to be the engine for indexes with more policy, as well as being useful itself. On creation, the index has no wordCount, no documentCount, and is, as expected, fairly empty. >>> from zc.catalog.index import SetIndex >>> index = SetIndex() >>> index.documentCount() 0 >>> index.wordCount() 0 >>> index.maxValue() # doctest: +ELLIPSIS Traceback (most recent call last): ... ValueError:... >>> index.minValue() # doctest: +ELLIPSIS Traceback (most recent call last): ... ValueError:... >>> list(index.values()) [] >>> len(index.apply({'any_of': (5,)})) 0 The index supports indexing any value. All values within a given index must sort consistently across Python versions. In our example, we hope that strings and integers will sort consistently; this may not be a reasonable hope. >>> data = {1: ['a', 1], ... 2: ['b', 'a', 3, 4, 7], ... 3: [1], ... 4: [1, 4, 'c'], ... 5: [7], ... 6: [5, 6, 7], ... 7: ['c'], ... 8: [1, 6], ... 9: ['a', 'c', 2, 3, 4, 6,], ... } >>> for k, v in data.items(): ... index.index_doc(k, v) ... After indexing, the statistics and values match the newly entered content. >>> list(index.values()) [1, 2, 3, 4, 5, 6, 7, 'a', 'b', 'c'] >>> index.documentCount() 9 >>> index.wordCount() 10 >>> index.maxValue() 'c' >>> index.minValue() 1 >>> list(index.ids()) [1, 2, 3, 4, 5, 6, 7, 8, 9] The index supports five types of query. The first is 'any_of'. It takes an iterable of values, and returns an iterable of document ids that contain any of the values. The results are weighted. >>> list(index.apply({'any_of':('b', 1, 5)})) [1, 2, 3, 4, 6, 8] >>> list(index.apply({'any_of': ('b', 1, 5)})) [1, 2, 3, 4, 6, 8] >>> list(index.apply({'any_of':(42,)})) [] >>> index.apply({'any_of': ('a', 3, 7)}) BTrees._IFBTree.IFBucket([(1, 1.0), (2, 3.0), (5, 1.0), (6, 1.0), (9, 2.0)]) Another query is 'qny', If the key is None, all indexed document ids with any values are returned. If the key is an extent, the intersection of the extent and all document ids with any values is returned. >>> list(index.apply({'any': None})) [1, 2, 3, 4, 5, 6, 7, 8, 9] >>> from zc.catalog.extentcatalog import FilterExtent >>> extent = FilterExtent(lambda extent, uid, obj: True) >>> for i in range(15): ... extent.add(i, i) ... >>> list(index.apply({'any': extent})) [1, 2, 3, 4, 5, 6, 7, 8, 9] >>> limited_extent = FilterExtent(lambda extent, uid, obj: True) >>> for i in range(5): ... limited_extent.add(i, i) ... >>> list(index.apply({'any': limited_extent})) [1, 2, 3, 4] The 'contains_all' argument also takes an iterable of values, but returns an iterable of document ids that contains all of the values. The results are not weighted. >>> list(index.apply({'all_of': ('a',)})) [1, 2, 9] >>> list(index.apply({'all_of': (3, 4)})) [2, 9] The 'between' argument takes from 1 to four values. The first is the minimum, and defaults to None, indicating no minimum; the second is the maximum, and defaults to None, indicating no maximum; the next is a boolean for whether the minimum value should be excluded, and defaults to False; and the last is a boolean for whether the maximum value sh
[Zope3-dev] KeywordIndex
I'm working on a simple application which is the first time I get to use the catalog in Zope 3. I'm writing against Zope 3.1b1. I was dismayed not to see KeywordIndex in the main catalog set, but then I found it in zope.index.keyword. But it seems to be a bit behind. I have it somewhat working through subclassing, etc, but it's been purely guess work on my part to get things this far. In my product package, I have the following: -- from zope.app.catalog.interfaces import IAttributeIndex, ICatalogIndex from zope.app.catalog.attribute import AttributeIndex from zope.app.container.contained import Contained from zope.index.keyword.index import KeywordIndex as KeywordIndexBase from zope.proxy import removeAllProxies from BTrees.IFBTree import IFTreeSet, IFSet, multiunion class IKeywordIndex(IAttributeIndex, ICatalogIndex): """ Interface-based catalog keyword index. """ class KeywordIndex(AttributeIndex, KeywordIndexBase, Contained): implements(IKeywordIndex) def index_doc(self, docid, value): # All security proxies need to be removed from the value. value = removeAllProxies(value) return super(KeywordIndex, self).index_doc(docid, value) def apply(self, query): # Keyword index doesn't implement apply(query) either. return self.search(removeAllProxies(query)) def _insert_forward(self, docid, words): """insert a sequence of words into the forward index """ # Replaces parent _insert_forward because apply() claims to want IFSets idx = self._fwd_index has_key = idx.has_key for word in words: if not has_key(word): idx[word] = IFSet() idx[word].insert(docid) -- I first overrode index_doc because the base KeywordIndex does an isinstance(value, (ListType, TupleType)), which failed on a security proxy guarded value. Then I added 'apply()' when I noticed that the base KeywordIndex didn't implement apply. Looking at the other supported indexes and at the index interfaces in zope.index, I noticed that IFSets were what was desired as the output of apply(), and that's when I replaced _insert_forward with a near identical copy that uses IFSet. This works... so long as I only search for one keyword. If I search for more than one through the catalog interface (and I imagine I would get the same result manually), I get the following traceback: -- File "/Users/jshell/Documents/Programming/kbase/lib/python/br/kbase/browser/search.py", line 22, in search results = catalog.searchResults(tags=query) File "/Library/ZopeX3/3.1/lib/python/zope/app/catalog/catalog.py", line 105, in searchResults results = self.apply(searchterms) File "/Library/ZopeX3/3.1/lib/python/zope/app/catalog/catalog.py", line 84, in apply r = index.apply(index_query) File "/Users/jshell/Documents/Programming/kbase/lib/python/br/kbase/catalog.py", line 36, in apply return self.search(removeAllProxies(query)) File "/Library/ZopeX3/3.1/lib/python/zope/index/keyword/index.py", line 139, in search rs = f(rs, docids) TypeError: invalid argument -- 'f' is IISet.intersection() The implementation of search() in the base KeywordIndex uses IISets for default values. I don't know if this is conflicting with the IFSets I set up in my subclass. I tried quickly editing zope.index.keyword.index to use IFSets instead, but I got the same traceback and then quickly reverted back to leaving the code untouched. It's been *years* since I've even touched simple indexing code, so I don't really know what's going on here or what's required. I would really like to have Keyword Index. In fact, such an index is the core of my application. I can throw together my own, I'm sure, that's a bit more brute force for my own purposes if necessary. I don't claim to have a solid understanding of how indexes and the catalog work (although it's been much easier to figure out in Zope 3, thanks!) Is there any reason why KeywordIndex seems half-abandoned? I guess it's not exposed to zope.app because of this. What would it take to make it catch up to FieldIndex? Thanks -- Jeff Shell ___ Zope3-dev mailing list Zope3-dev@zope.org Unsub: http://mail.zope.org/mailman/options/zope3-dev/archive%40mail-archive.com