Re: highlighting/summarizing and solr

Chris Hostetter Thu, 22 Jun 2006 14:57:48 -0700

: > Output: Currently, the summary data is output as a separate element in
: > the <response> element (like the debug data is currently).  This is
: > not hard to parse, but perhaps it would be more consistent to add it
: > to the <doc> elements (seems like that would require a bit of
: > hackery).
:
: It does seem like it would be easier for clients to parse document
: associated data if it is included directly in the <doc> element.


I acctually like the idea that it's included seperately ... it's really
not that much harder to get at then if it's in the individual documents,
and it makes it really easy to differentiate beteen "stored fields" of the
document and "highlighted" info about the document .. especially if
highlighting can be applied to non stored fields using TermVectors.

It also allows the highlighting section of the response to include a lot
of extra data about the highlighted snippets, that would be cumbersome to
try and fit into the <doc>.  I started hypothisizing down this road in
this old message...
        http://www.nabble.com/Re%3A-highlighting-p3954083.html
...but didn't really get to some of the crazier things you could do with
it (like reporting back where in the document a snippet starts)

: > summarized/highlighted, it is usually done in the same manner (and
: > different fields require different Formatter/Fragmenter/Scorer
: > criteria).  Ideally, the customization should be done in the
: > FieldType, and the only RequestHandler customization is the selection
: > of which fields to highlight.
:
: I'm not sure if this is really the property of a field.
: Another possibility is using init params in the request handler
: defined in solrconfig.xml, with the possibility of overriding them in
: a request.

I agree with Yonik .. it might be usefull if there was a "suggested
higherlighter configuration" at the Field/FiledType level ...  but this
really seems like a RequestHandler configue option to me (where hte
RequestHandler can decide wether to have a query time option to override
it'se behavior).  That way you can have one instance of the
XyzRequestHandler which does highlighting on the "title" field, and
another instance with different init params that does highlighting on both
the "title" and "summary" fields, and another with different init params
that does summarizing/highlighting accross the title/summary and body
fields only returning the most relevent snippets (where there can be
snippet weighting based on field importance or something)

those should all be up to the person configuring the way the queries work
-- not the guy designing the schema.

: > Highlighter issues: Highlighter behaves badly with analyzers which
: > emit multiple tokens in the same position (ie. WordDelimiterFilter).
:
: File a Lucene bug?

Assuming that's an invarient, you could add an option to the request
handler to use a custom analyzer for the purposes of highlighting stored
fields (independed of the field type) ... that doesn't really help the
TermVectors situation, but assuming that invarient the onlything that
can help you hear is using an indexing analyzer that doesn't produce
multiple tokens at the same position.


-Hoss

Re: highlighting/summarizing and solr

Reply via email to