Folks, I have, like many people I'm certain, implemented field highlighting/summarizing in solr and am interested in contributing back a patch. There are various ways highlighting could be integrated into solr, and so I'd like to open up discussion a bit on this front before proceeding.
Current status: Highlighting is implemented as part of SolrPluginUtils and integrated in StandardRequestHandler and DisMax... It is capable of highlighting an arbitrary number of stored fields given a query. It uses term vectors, if present, to speed up highlighting (else the stored field needs to be re-analyzed). The doc cache is used so the performance impact is (relatively) minimal. Issues: Output: Currently, the summary data is output as a separate element in the <response> element (like the debug data is currently). This is not hard to parse, but perhaps it would be more consistent to add it to the <doc> elements (seems like that would require a bit of hackery). Customization: Currently, the fields summarized, the number of fragments, and the Formatter can be customized as a RequestHandler parameters. This isn't really optimal--if a field is summarized/highlighted, it is usually done in the same manner (and different fields require different Formatter/Fragmenter/Scorer criteria). Ideally, the customization should be done in the FieldType, and the only RequestHandler customization is the selection of which fields to highlight. Highlighter issues: Highlighter behaves badly with analyzers which emit multiple tokens in the same position (ie. WordDelimiterFilter). Thoughts? Plans? -Mike