I am indexing documents with Solr 4.0 and also processing the text
using a set of custom entity extractors (people, places, dates, times,
etc). The text field in Solr has Term Vectors enabled as well as term
positions and offsets, primarily so we can make use of Fast Vector
Highlighting. So far, it's working very well.

The entity extractors run on the source text prior to indexing and
record the position and offsets into the source text. The entity data
is stored in an external datasource (which happens to be indexed in a
separate Solr core). Given numeric character range in the source
document I can easily lookup all of the extracted entities that fall
within that range (i.e. all of the organizations and dates mentioned
between offset 100 and 200).

What I'd like to do is: Issue a query against the source text and
return (in addition to the highlight fragments) the position
information of the query matches within the text field so that I can
issue a secondary query to find co-mentioned entities within n
characters/terms. Alternatively the start/end positions of the
highlight fragments would be sufficient as well.

I'm familiar with the simpler aspects of Solr, but am quite stumped on
this one. Is this possible to do with "out-of-the-box" Solr 4.0?

Regards,
Mike

Reply via email to