This problem went away when I updated to use the latest nightly release
(2009-02-04)

- ashok

ashokc wrote:
> 
> I have seen some of these oddities that Chris is referring to. In my case,
> terms that are NOT in the query get highlighted. For example searching for
> 'Intel' highlights 'Microsot Corp' as well. I do not have them as synonyms
> either. Do these filter factories add some extra intelligence to the index
> in that if you search for 'Samsung' even 'LG' is considered a
> highlightable term?
> 
> I believe this was not the case when I was working with an earlier
> development version (from Nov or early Dec). Right now I am using
> solr-2008-12-29.war.
> 
> - ashok
> 
> 
> 
> ryguasu wrote:
>> 
>> I'm testing out the default (gap) fragmenter with some simple,
>> single-word queries on a patched 1.3.0 release populated with some
>> real-world data. (I think the primary quirk in my setup is that I'm
>> using ShingleFilterFactory to put word bigrams (aka shingles) into my
>> index. I was worried that this might mess up highlighting, but
>> highlighting is *mostly* working.) There are some oddities here, and
>> I'm wondering if people have any suggestions for debugging my setup
>> and/or trying to make a good, reproducible test case.
>> 
>> 1. The main weird thing is that, the vast majority of the time, the
>> highlighted term is the last term in the fragment. For example, if I
>> search for "cat", then almost all my fragments look like this:
>> 
>> fragment 1: "to the *cat*"
>> fragment 2: "with the *cat*"
>> fragment 3: "it's what the *cat*"
>> fragment 4: "Once upon a time the *cat*"
>> 
>> (My actual fragments are longer. The key to note is that all of these
>> examples end in "cat".)
>> 
>> Sometimes "cat" will appear at somewhere other than the last position,
>> but this is rare. My expectation, in contrast, is that "cat" would
>> tend to be more or less evenly distributed throughout fragment
>> positions.
>> 
>> Note: I tried to reproduce this on 1.3.0 with my patches applied but
>> using the example dataset/schema from the Solr source tree rather than
>> my own dataset/schema. With the example dataset this didn't seem to be
>> an issue.
>> 
>> I've experienced three other highlighting issues, which may or may not
>> be related:
>> 
>> 2. Sometimes, if a term appears multiple times in a fragment, not just
>> the term but all the words in between the two appearances will get
>> highlighted too. For example, I searched for "fear", and got this as
>> one of the snippets:
>> 
>>     SETTLEMENT AGREEMENT This Agreement ("the Agreement") is entered
>> into this 18th day of August, 2008, by
>>     and between Cape <em>Fear Bank Corporation, a North Carolina
>> corporation (the "Company"), and Cape Fear</em>
>> 
>> In contrast, I would have expected
>> 
>>     SETTLEMENT AGREEMENT This Agreement ("the Agreement") is entered
>> into this 18th day of August, 2008, by
>>     and between Cape <em>Fear</em> Bank Corporation, a North Carolina
>> corporation (the "Company"), and Cape <em>Fear</em>
>> 
>> 3. My install seems to have a curiously liberal interpretation of
>> hl.fragsize. Now if I put hl.fragsize=0, then things are as expected,
>> i.e. it highlights the whole field. And it also seems more or less
>> true (as it should) that as I increase hl.fragsize, the fragments get
>> longer. However, I was surprised to see that when I put hl.fragsize=1
>> or hl.fragsize=5, I can get fragments as long as this one:
>> 
>>     addition, we believe the wireless feature for our controller will
>> facilitate exceptional customer services and
>>     response time." About GpsLatitude GpsLatitude, a Montreal-based
>> company, is a provider of security
>>     solutions and tracking for mobile assets. It is also a developer
>> of advanced " Videlocalisation" , a cost-effective,
>>     integrated mobile digital <em>video</em>
>> 
>> That seems shockingly long for something of size "five".
>> 
>> 4. Very rarely I'll get a fragment that doesn't actually contain any
>> of the search terms. For example, maybe I'll search for "cat", and
>> I'll get back "three ounces of milk" as a snippet. I need to explore
>> this more, though the last time this happened when I opened the
>> document and found that when I located "three ounces of milk" in the
>> document text, the word "cat" did appear nearby; so maybe the document
>> did contain "three ounces of milk for the cat".
>> 
>> Obviously I'm not describing my setup in much detail. Let me know what
>> you think would be helpful to know more about.
>> 
>> Thanks,
>> Chris
>> 
>> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Highlighting-Oddities-tp20351015p21843092.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to