Hi Christopher,
I am working my way through trying to implement SpanQueries in Solr
(svn trunk). From my lack of progress, I am skeptical that I can help
much, but I would be happy to try.
I imagine you have already found (either before your message, or
after posting it) Grant's lucene, spanquery, and WindowTermVectorMapper
overview:
http://www.lucidimagination.com/blog/2009/05/26/accessing-words-around-a-positional-match-in-lucene/
I'd be interested in hearing about your progress.
Good luck
Sean
On 10/26/2010 08:26 AM, Christopher Ball wrote:
Am about to implement a custom query that is sort of mash-up of Facets,
Highlighting, and SpanQuery - but thought I'd see if anyone has done
anything similar.
In simple words, I need facet on the next word given a target word.
For example, if my index only had the following 5 documents (comprised of a
sentence each):
Doc 1 - The quick brown fox jumped over the fence.
Doc 2 - The sly fox skipped over the fence.
Doc 3 - The fat fox skipped his afternoon class.
Doc 4 - A brown duck and red fox, crashed the party.
Doc 5 - Charles Brown! Fox! Crashed my damn car.
The query should give the frequency of the distinct terms after the word
"fox":
skipped - 2
crashed - 2
jumped - 1
Long-term, do the opposite - frequency of the distinct terms before the word
"fox":
brown - 2
sly - 1
fat - 1
red - 1
My guess is that either the FastVectorHighlighter or SpanQuery would be a
reasonable starting point. I was hoping to take advantage of Vectors as I am
storing termVectors, termPositions, and termOffsets for the field in
question.
Grateful for any thoughts . . . reference implementations . . . words of
encouragement . . . free beer - whatever you can offer.
Gracias,
Christopher