: > Output: Currently, the summary data is output as a separate element in : > the <response> element (like the debug data is currently). This is : > not hard to parse, but perhaps it would be more consistent to add it : > to the <doc> elements (seems like that would require a bit of : > hackery). : : It does seem like it would be easier for clients to parse document : associated data if it is included directly in the <doc> element.
I acctually like the idea that it's included seperately ... it's really not that much harder to get at then if it's in the individual documents, and it makes it really easy to differentiate beteen "stored fields" of the document and "highlighted" info about the document .. especially if highlighting can be applied to non stored fields using TermVectors. It also allows the highlighting section of the response to include a lot of extra data about the highlighted snippets, that would be cumbersome to try and fit into the <doc>. I started hypothisizing down this road in this old message... http://www.nabble.com/Re%3A-highlighting-p3954083.html ...but didn't really get to some of the crazier things you could do with it (like reporting back where in the document a snippet starts) : > summarized/highlighted, it is usually done in the same manner (and : > different fields require different Formatter/Fragmenter/Scorer : > criteria). Ideally, the customization should be done in the : > FieldType, and the only RequestHandler customization is the selection : > of which fields to highlight. : : I'm not sure if this is really the property of a field. : Another possibility is using init params in the request handler : defined in solrconfig.xml, with the possibility of overriding them in : a request. I agree with Yonik .. it might be usefull if there was a "suggested higherlighter configuration" at the Field/FiledType level ... but this really seems like a RequestHandler configue option to me (where hte RequestHandler can decide wether to have a query time option to override it'se behavior). That way you can have one instance of the XyzRequestHandler which does highlighting on the "title" field, and another instance with different init params that does highlighting on both the "title" and "summary" fields, and another with different init params that does summarizing/highlighting accross the title/summary and body fields only returning the most relevent snippets (where there can be snippet weighting based on field importance or something) those should all be up to the person configuring the way the queries work -- not the guy designing the schema. : > Highlighter issues: Highlighter behaves badly with analyzers which : > emit multiple tokens in the same position (ie. WordDelimiterFilter). : : File a Lucene bug? Assuming that's an invarient, you could add an option to the request handler to use a custom analyzer for the purposes of highlighting stored fields (independed of the field type) ... that doesn't really help the TermVectors situation, but assuming that invarient the onlything that can help you hear is using an indexing analyzer that doesn't produce multiple tokens at the same position. -Hoss