[jira] Commented: (SOLR-553) Highlighter does not match phrase queries correctly

Mark Miller (JIRA) Tue, 20 May 2008 14:51:27 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12598494#action_12598494
 ]


Mark Miller commented on SOLR-553:
----------------------------------

>Probably best to create a new ticket (if necessary) about the <span>ax</span> 
><span>bx</span> instead of <span>ax bx</span> problem. That >highlights have 
>incorrect matches is far worse. I'll adjust the problem description.

If I remember correctly, this was an ease of implementation issue. Part of it 
was fitting into the current Highlighter framework (individual tokens are 
scored and highlighted) and part of it was ease in general I think. I am not 
sure that it would be too easy to alter.

It's very easy to do with the new Highlighter I have been working on, the 
LargeDocHighlighter. It breaks from the current API, and makes this type of 
highlight markup quite easy. It may never see the light of day though...to do 
what I want, all parts of the query need to be located with the MemoryIndex, 
and the time this takes on non position sensitive queries clauses is almost 
equal to the savings I get from not iterating through and scoring each token in 
a TokenStream. I do still have hopes I can pull something off though, and it 
may end up being useful for something else.

For now though, Highlighting each each token seems a small inconvenience to 
retain all the old Highlighters tests, corner cases, and speed in non position 
sensitive scoring. Thats not to say there will not be a way if you take a look 
at the code though.

> Highlighter does not match phrase queries correctly
> ---------------------------------------------------
>
>                 Key: SOLR-553
>                 URL: https://issues.apache.org/jira/browse/SOLR-553
>             Project: Solr
>          Issue Type: New Feature
>          Components: highlighter
>    Affects Versions: 1.2
>         Environment: all
>            Reporter: Brian Whitman
>            Assignee: Otis Gospodnetic
>         Attachments: highlighttest.xml, Solr-553.patch, Solr-553.patch, 
> Solr-553.patch
>
>
> http://www.nabble.com/highlighting-pt2%3A-returning-tokens-out-of-order-from-PhraseQuery-to16156718.html
> Say we search for the band "I Love You But I've Chosen Darkness"
> .../selectrows=100&q=%22I%20Love%20You%20But%20I\'ve%20Chosen%20Darkness%22&fq=type:html&hl=true&hl.fl=content&hl.fragsize=500&hl.snippets=5&hl.simple.pre=%3Cspan%3E&hl.simple.post=%3C/span%3E
> The highlight returns a snippet that does have the name altogether:
> Lights (Live) : <span>I</span> <span>Love</span> <span>You</span> But 
> <span>I've</span> <span>Chosen</span> <span>Darkness</span> :
> But also returns unrelated snips from the same page:
> Black Francis Shop "<span>I</span> Think <span>I</span> <span>Love</span> 
> <span>You</span>"
> A correct highlighter should not return snippets that do not match the phrase 
> exactly.
> LUCENE-794 (not yet committed, but seems to be ready) fixes up the problem 
> from the Lucene end. Solr should get it too.
> Related: SOLR-575 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-553) Highlighter does not match phrase queries correctly

Reply via email to