Hi guys,

Since there is no full-text search available in GAE/j and I really
need this for a new app I am writing I have made a prototype
implementation of an inverted index using GAE store.

Term is stored as a key with actual term as name in key (only key is
needed)
Below each term I've added document references as another key like
this Term("term")/DocumentRef("10") where 10 is the internal document
number.
An example:

Term("stuff")
  DocRef("1")
  DocRef("2")

Term("more")
  DocRef("1")

When searching for e.g. "more stuff" (which is boolean and) I do this:

Query DocRef's from the Term with the least doc-refs (children, this
info is cached) and load keys into a sorted set.
Then query for doc-refs under the second term filtering from the min.
doc-id in the sorted set and the max doc-id (meaning we only get
possible matches in the docs we've know contains the first term.
Merge sets.

What do you think? Is this a fair way to implement this (working on
scoring using tf-idf) and do you think its possible to get it to
perform well?

/Lars Borup

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine for Java" group.
To post to this group, send email to google-appengine-j...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en.

Reply via email to