Folks,

I have, like many people I'm certain, implemented field
highlighting/summarizing in solr and am interested in contributing
back a patch.  There are various ways highlighting could be integrated
into solr, and so I'd like to open up discussion a bit on this front
before proceeding.

Current status: Highlighting is implemented as part of SolrPluginUtils
and integrated in StandardRequestHandler and DisMax...  It is capable
of highlighting an arbitrary number of stored fields given a query.
It uses term vectors, if present, to speed up highlighting (else the
stored field needs to be re-analyzed).  The doc cache is used so the
performance impact is (relatively) minimal.

Issues:

Output: Currently, the summary data is output as a separate element in
the <response> element (like the debug data is currently).  This is
not hard to parse, but perhaps it would be more consistent to add it
to the <doc> elements (seems like that would require a bit of
hackery).

Customization: Currently, the fields summarized, the number of
fragments, and the Formatter can be customized as a RequestHandler
parameters.  This isn't really optimal--if a field is
summarized/highlighted, it is usually done in the same manner (and
different fields require different Formatter/Fragmenter/Scorer
criteria).  Ideally, the customization should be done in the
FieldType, and the only RequestHandler customization is the selection
of which fields to highlight.

Highlighter issues: Highlighter behaves badly with analyzers which
emit multiple tokens in the same position (ie. WordDelimiterFilter).

Thoughts? Plans?
-Mike

Reply via email to