[Zope3-dev] KeywordIndex, TopicIndex should implement IIndexSearch

2007-06-22 Thread Dmitry Vasiliev


I've just discovered that KeywordIndex and TopicIndex don't implement 
IIndexSearch so the indexes don't work with the Catalog. Is there any 
reasons for this? I think apply methods in this case can be equivalent 
to search(query, 'and') without any problems.


--
Dmitry Vasiliev dima at hlabs.spb.ru
http://hlabs.spb.ru
___
Zope3-dev mailing list
Zope3-dev@zope.org
Unsub: http://mail.zope.org/mailman/options/zope3-dev/archive%40mail-archive.com



Re: [Zope3-dev] KeywordIndex

2005-07-22 Thread Gary Poster


On Jul 22, 2005, at 2:30 PM, Michel Pelletier wrote:

Awesome!  I like the idea of set indexes and always looking for new
indexing technique.  Only having looked breifly at the code myself,  
what

is the relation between zc.catalog and zope.app.catalog?


The stuff in zc.catalog covers code that would live in both  
zope.index and zope.app.catalog.  The current plan is to divide the  
code into those two packages once Zope 3.1 is released.


Specifically, the extent catalog extends the capabilities of  
zope.app.catalog and might go there; the indexes in index.py might go  
in zope.index; the subclasses in catalogindex might go in  
zope.app.catalog; the globber will disappear, never to be seen again;  
and the other stuff I'm not sure about. :-)


Again, the sandbox README touched on this and described the rest of  
the code; I think it's a decent intro.


http://svn.zope.org/Sandbox/zc/README.txt?rev=37377view=auto

Gary
___
Zope3-dev mailing list
Zope3-dev@zope.org
Unsub: http://mail.zope.org/mailman/options/zope3-dev/archive%40mail-archive.com



Re: [Zope3-dev] KeywordIndex

2005-07-21 Thread Gary Poster


On Jul 19, 2005, at 1:06 AM, Jeff Shell wrote:


Hi Gary! I'd be very interested in this. It's not critical for me
right now, so there's no need to rush making something available. I
have an inefficient but fun solution for my system that can be
replaced when this comes along. I primarily wanted to know the state
of the indexes.


Hey again Jeff (and Michel, since you said you were interested  
too :-) ).  I got snapshots of the catalog stuff up tonight (along  
with 'listcontainer', which is not as general interest) on zope.org  
as ZPL.  Here's the README:


http://svn.zope.org/Sandbox/zc/README.txt?rev=37377view=auto

And, of course, check them out from svn.zope.org/repos/main/Sandbox/ 
zc/catalog and svn.zope.org/repos/main/Sandbox/zc/listcontainer,  
respectively.


Gary
___
Zope3-dev mailing list
Zope3-dev@zope.org
Unsub: http://mail.zope.org/mailman/options/zope3-dev/archive%40mail-archive.com



[Zope3-dev] KeywordIndex

2005-07-18 Thread Jeff Shell
I'm working on a simple application which is the first time I get to
use the catalog in Zope 3. I'm writing against Zope 3.1b1. I was
dismayed not to see KeywordIndex in the main catalog set, but then I
found it in zope.index.keyword. But it seems to be a bit behind. I
have it somewhat working through subclassing, etc, but it's been
purely guess work on my part to get things this far. In my product
package, I have the following:

--
from zope.app.catalog.interfaces import IAttributeIndex, ICatalogIndex
from zope.app.catalog.attribute import AttributeIndex
from zope.app.container.contained import Contained
from zope.index.keyword.index import KeywordIndex as KeywordIndexBase
from zope.proxy import removeAllProxies
from BTrees.IFBTree import IFTreeSet, IFSet, multiunion

class IKeywordIndex(IAttributeIndex, ICatalogIndex):
 Interface-based catalog keyword index. 

class KeywordIndex(AttributeIndex, KeywordIndexBase, Contained):
implements(IKeywordIndex)

def index_doc(self, docid, value):
# All security proxies need to be removed from the value.
value = removeAllProxies(value)
return super(KeywordIndex, self).index_doc(docid, value)

def apply(self, query):
# Keyword index doesn't implement apply(query) either.
return self.search(removeAllProxies(query))

def _insert_forward(self, docid, words):
insert a sequence of words into the forward index 
# Replaces parent _insert_forward because apply() claims to want IFSets
idx = self._fwd_index
has_key = idx.has_key
for word in words:
if not has_key(word):
idx[word] = IFSet()
idx[word].insert(docid)
--
I first overrode index_doc because the base KeywordIndex does an
isinstance(value, (ListType, TupleType)), which failed on a security
proxy guarded value. Then I added 'apply()' when I noticed that the
base KeywordIndex didn't implement apply. Looking at the other
supported indexes and at the index interfaces in zope.index, I noticed
that IFSets were what was desired as the output of apply(), and that's
when I replaced _insert_forward with a near identical copy that uses
IFSet.

This works... so long as I only search for one keyword. If I search
for more than one through the catalog interface (and I imagine I would
get the same result manually), I get the following traceback:

--  
File 
/Users/jshell/Documents/Programming/kbase/lib/python/br/kbase/browser/search.py,
line 22, in search
results = catalog.searchResults(tags=query)
  File /Library/ZopeX3/3.1/lib/python/zope/app/catalog/catalog.py,
line 105, in searchResults
results = self.apply(searchterms)
  File /Library/ZopeX3/3.1/lib/python/zope/app/catalog/catalog.py,
line 84, in apply
r = index.apply(index_query)
  File 
/Users/jshell/Documents/Programming/kbase/lib/python/br/kbase/catalog.py,
line 36, in apply
return self.search(removeAllProxies(query))
  File /Library/ZopeX3/3.1/lib/python/zope/index/keyword/index.py,
line 139, in search
rs = f(rs, docids)
TypeError: invalid argument
--
'f' is IISet.intersection()

The implementation of search() in the base KeywordIndex uses IISets
for default values. I don't know if this is conflicting with the
IFSets I set up in my subclass. I tried quickly editing
zope.index.keyword.index to use IFSets instead, but I got the same
traceback and then quickly reverted back to leaving the code
untouched.

It's been *years* since I've even touched simple indexing code, so I
don't really know what's going on here or what's required.

I would really like to have Keyword Index. In fact, such an index is
the core of my application. I can throw together my own, I'm sure,
that's a bit more brute force for my own purposes if necessary. I
don't claim to have a solid understanding of how indexes and the
catalog work (although it's been much easier to figure out in Zope 3,
thanks!)

Is there any reason why KeywordIndex seems half-abandoned? I guess
it's not exposed to zope.app because of this. What would it take to
make it catch up to FieldIndex?

Thanks
--
Jeff Shell
___
Zope3-dev mailing list
Zope3-dev@zope.org
Unsub: http://mail.zope.org/mailman/options/zope3-dev/archive%40mail-archive.com



Re: [Zope3-dev] KeywordIndex

2005-07-18 Thread Gary Poster


On Jul 18, 2005, at 11:14 AM, Jeff Shell wrote:


I'm working on a simple application which is the first time I get to
use the catalog in Zope 3. I'm writing against Zope 3.1b1. I was
dismayed not to see KeywordIndex in the main catalog set, but then I
found it in zope.index.keyword. But it seems to be a bit behind.


Hi.  Yes, we needed it too.  Here's another thing we want to open  
source.  Look at the attached .txt file; if you want it then tell me  
and I'll make it available in a sandbox.  We'll move it over into the  
Zope repo (probably with a new name, or rearranged on the appropriate  
locations (zope.index and zope.app.catalog, etc.) RSN.


Downsides:

- Note that some functionality requires that you use an extent  
catalog, another goodie in the package.


- We have some refactoring of this that we want to do.  We'll have  
legacy issues ourselves, then.


Additional upside:

-  This package also includes a replacement for the field index  
(called a value index) and customizations of the value and set  
indexes specific to timezone-aware datetimes, as well as a few other  
things.


Gary

The setindex is an index similar to, but more general than a traditional
keyword index.  The values indexed are expected to be iterables; the index
allows searches for documents that contain any of a set of values; all of a set
of values; or between a set of values.

Additionally, the index supports an interface that allows examination of the
indexed values.

It is as policy-free as possible, and is intended to be the engine for indexes
with more policy, as well as being useful itself.

On creation, the index has no wordCount, no documentCount, and is, as
expected, fairly empty.

 from zc.catalog.index import SetIndex
 index = SetIndex()
 index.documentCount()
0
 index.wordCount()
0
 index.maxValue() # doctest: +ELLIPSIS
Traceback (most recent call last):
...
ValueError:...
 index.minValue() # doctest: +ELLIPSIS
Traceback (most recent call last):
...
ValueError:...
 list(index.values())
[]
 len(index.apply({'any_of': (5,)}))
0

The index supports indexing any value.  All values within a given index must
sort consistently across Python versions.  In our example, we hope that strings
and integers will sort consistently; this may not be a reasonable hope.

 data = {1: ['a', 1],
... 2: ['b', 'a', 3, 4, 7],
... 3: [1],
... 4: [1, 4, 'c'],
... 5: [7],
... 6: [5, 6, 7],
... 7: ['c'],
... 8: [1, 6],
... 9: ['a', 'c', 2, 3, 4, 6,],
... }
 for k, v in data.items():
... index.index_doc(k, v)
...

After indexing, the statistics and values match the newly entered content. 

 list(index.values())
[1, 2, 3, 4, 5, 6, 7, 'a', 'b', 'c']
 index.documentCount()
9
 index.wordCount()
10
 index.maxValue()
'c'
 index.minValue()
1
 list(index.ids())
[1, 2, 3, 4, 5, 6, 7, 8, 9]

The index supports five types of query.  The first is 'any_of'.  It
takes an iterable of values, and returns an iterable of document ids that
contain any of the values.  The results are weighted.

 list(index.apply({'any_of':('b', 1, 5)}))
[1, 2, 3, 4, 6, 8]
 list(index.apply({'any_of': ('b', 1, 5)}))
[1, 2, 3, 4, 6, 8]
 list(index.apply({'any_of':(42,)}))
[]
 index.apply({'any_of': ('a', 3, 7)})
BTrees._IFBTree.IFBucket([(1, 1.0), (2, 3.0), (5, 1.0), (6, 1.0), (9, 2.0)])

Another query is 'qny', If the key is None, all indexed document ids with any
values are returned.  If the key is an extent, the intersection of the extent
and all document ids with any values is returned.

 list(index.apply({'any': None}))
[1, 2, 3, 4, 5, 6, 7, 8, 9]

 from zc.catalog.extentcatalog import FilterExtent
 extent = FilterExtent(lambda extent, uid, obj: True)
 for i in range(15):
... extent.add(i, i)
...
 list(index.apply({'any': extent}))
[1, 2, 3, 4, 5, 6, 7, 8, 9]
 limited_extent = FilterExtent(lambda extent, uid, obj: True)
 for i in range(5):
... limited_extent.add(i, i)
...
 list(index.apply({'any': limited_extent}))
[1, 2, 3, 4]

The 'contains_all' argument also takes an iterable of values, but returns an
iterable of document ids that contains all of the values.  The results are not
weighted.

 list(index.apply({'all_of': ('a',)}))
[1, 2, 9]
 list(index.apply({'all_of': (3, 4)}))
[2, 9]

The 'between' argument takes from 1 to four values.  The first is the 
minimum, and defaults to None, indicating no minimum; the second is the 
maximum, and defaults to None, indicating no maximum; the next is a boolean for
whether the minimum value should be excluded, and defaults to False; and the
last is a boolean for whether the maximum value should be excluded, and also
defaults to False.  The results are weighted.

 

Re: [Zope3-dev] KeywordIndex

2005-07-18 Thread Michel Pelletier
On Mon, 2005-07-18 at 12:00 -0400, [EMAIL PROTECTED] wrote:

 Date: Mon, 18 Jul 2005 09:14:16 -0600
 From: Jeff Shell [EMAIL PROTECTED]
 Subject: [Zope3-dev] KeywordIndex
 To: zope3-dev@zope.org
 Message-ID: [EMAIL PROTECTED]
 Content-Type: text/plain; charset=ISO-8859-1
 
 I'm working on a simple application which is the first time I get to
 use the catalog in Zope 3. I'm writing against Zope 3.1b1. I was
 dismayed not to see KeywordIndex in the main catalog set, but then I
 found it in zope.index.keyword. But it seems to be a bit behind. I
 have it somewhat working through subclassing, etc, but it's been
 purely guess work on my part to get things this far. In my product
 package, I have the following:

snip
 1. 
I'm unable to help you directly with your problem, although Gary's post
about the SetIndex looks very promising and I would like to see that
code as well.  As you said, something like a keyword index is exactly
what your application is designed around, but if I could digress from
the topic a little I'd like to suggest another solution, rdflib.

rdflib covers some of the same use cases as a keyword index.  Your
objects (content, whatever) and your keywords would be assigned unique
identifiers.  You then add relations to an rdflib.Graph that associate
the keyword with your objects:

 dc = rdflib.Namespace('http://purl.org/dc/elements/1.1/')
 blue = rdflib.BNode()  # creates a URI for you
 g = rdflib.Graph()
 g.add((object_uri, dc.keywords, blue))

and then query the data back out with either a low-level g.triples((s,
p, o)) pattern or a sparql query.  For example, print a list of all
object URIs that have the blue_uri keyword:

 print [s for (s, p, o) in g.triples((rdflib.Any, dc.keywords, blue))]

or sparql:

 sg = rdflib.sparqlGraph(g)
 select = (?object_uri,)
 where = rdflib.GraphPattern([(?object_uri, dc.keywords, blue)])
 for object_uri in sg.query(select, where): print object_uri

is the same query, but longer.  With sparql you can do more complex sql
like queries however against other relations.  

Hope this helps,

-Michel

___
Zope3-dev mailing list
Zope3-dev@zope.org
Unsub: http://mail.zope.org/mailman/options/zope3-dev/archive%40mail-archive.com



Re: [Zope3-dev] KeywordIndex

2005-07-18 Thread Jeff Shell
Hi Gary! I'd be very interested in this. It's not critical for me
right now, so there's no need to rush making something available. I
have an inefficient but fun solution for my system that can be
replaced when this comes along. I primarily wanted to know the state
of the indexes.

Is what's there right now going to be what ships with Zope 3.1 final?

On 7/18/05, Gary Poster [EMAIL PROTECTED] wrote:
 
 On Jul 18, 2005, at 11:14 AM, Jeff Shell wrote:
 
  I'm working on a simple application which is the first time I get to
  use the catalog in Zope 3. I'm writing against Zope 3.1b1. I was
  dismayed not to see KeywordIndex in the main catalog set, but then I
  found it in zope.index.keyword. But it seems to be a bit behind.
 
 Hi.  Yes, we needed it too.  Here's another thing we want to open
 source.  Look at the attached .txt file; if you want it then tell me
 and I'll make it available in a sandbox.  We'll move it over into the
 Zope repo (probably with a new name, or rearranged on the appropriate
 locations (zope.index and zope.app.catalog, etc.) RSN.
 
 Downsides:
 
 - Note that some functionality requires that you use an extent
 catalog, another goodie in the package.
 
 - We have some refactoring of this that we want to do.  We'll have
 legacy issues ourselves, then.
 
 Additional upside:
 
 -  This package also includes a replacement for the field index
 (called a value index) and customizations of the value and set
 indexes specific to timezone-aware datetimes, as well as a few other
 things.
___
Zope3-dev mailing list
Zope3-dev@zope.org
Unsub: http://mail.zope.org/mailman/options/zope3-dev/archive%40mail-archive.com