This problem went away when I updated to use the latest nightly release (2009-02-04)
- ashok ashokc wrote: > > I have seen some of these oddities that Chris is referring to. In my case, > terms that are NOT in the query get highlighted. For example searching for > 'Intel' highlights 'Microsot Corp' as well. I do not have them as synonyms > either. Do these filter factories add some extra intelligence to the index > in that if you search for 'Samsung' even 'LG' is considered a > highlightable term? > > I believe this was not the case when I was working with an earlier > development version (from Nov or early Dec). Right now I am using > solr-2008-12-29.war. > > - ashok > > > > ryguasu wrote: >> >> I'm testing out the default (gap) fragmenter with some simple, >> single-word queries on a patched 1.3.0 release populated with some >> real-world data. (I think the primary quirk in my setup is that I'm >> using ShingleFilterFactory to put word bigrams (aka shingles) into my >> index. I was worried that this might mess up highlighting, but >> highlighting is *mostly* working.) There are some oddities here, and >> I'm wondering if people have any suggestions for debugging my setup >> and/or trying to make a good, reproducible test case. >> >> 1. The main weird thing is that, the vast majority of the time, the >> highlighted term is the last term in the fragment. For example, if I >> search for "cat", then almost all my fragments look like this: >> >> fragment 1: "to the *cat*" >> fragment 2: "with the *cat*" >> fragment 3: "it's what the *cat*" >> fragment 4: "Once upon a time the *cat*" >> >> (My actual fragments are longer. The key to note is that all of these >> examples end in "cat".) >> >> Sometimes "cat" will appear at somewhere other than the last position, >> but this is rare. My expectation, in contrast, is that "cat" would >> tend to be more or less evenly distributed throughout fragment >> positions. >> >> Note: I tried to reproduce this on 1.3.0 with my patches applied but >> using the example dataset/schema from the Solr source tree rather than >> my own dataset/schema. With the example dataset this didn't seem to be >> an issue. >> >> I've experienced three other highlighting issues, which may or may not >> be related: >> >> 2. Sometimes, if a term appears multiple times in a fragment, not just >> the term but all the words in between the two appearances will get >> highlighted too. For example, I searched for "fear", and got this as >> one of the snippets: >> >> SETTLEMENT AGREEMENT This Agreement ("the Agreement") is entered >> into this 18th day of August, 2008, by >> and between Cape <em>Fear Bank Corporation, a North Carolina >> corporation (the "Company"), and Cape Fear</em> >> >> In contrast, I would have expected >> >> SETTLEMENT AGREEMENT This Agreement ("the Agreement") is entered >> into this 18th day of August, 2008, by >> and between Cape <em>Fear</em> Bank Corporation, a North Carolina >> corporation (the "Company"), and Cape <em>Fear</em> >> >> 3. My install seems to have a curiously liberal interpretation of >> hl.fragsize. Now if I put hl.fragsize=0, then things are as expected, >> i.e. it highlights the whole field. And it also seems more or less >> true (as it should) that as I increase hl.fragsize, the fragments get >> longer. However, I was surprised to see that when I put hl.fragsize=1 >> or hl.fragsize=5, I can get fragments as long as this one: >> >> addition, we believe the wireless feature for our controller will >> facilitate exceptional customer services and >> response time." About GpsLatitude GpsLatitude, a Montreal-based >> company, is a provider of security >> solutions and tracking for mobile assets. It is also a developer >> of advanced " Videlocalisation" , a cost-effective, >> integrated mobile digital <em>video</em> >> >> That seems shockingly long for something of size "five". >> >> 4. Very rarely I'll get a fragment that doesn't actually contain any >> of the search terms. For example, maybe I'll search for "cat", and >> I'll get back "three ounces of milk" as a snippet. I need to explore >> this more, though the last time this happened when I opened the >> document and found that when I located "three ounces of milk" in the >> document text, the word "cat" did appear nearby; so maybe the document >> did contain "three ounces of milk for the cat". >> >> Obviously I'm not describing my setup in much detail. Let me know what >> you think would be helpful to know more about. >> >> Thanks, >> Chris >> >> > > -- View this message in context: http://www.nabble.com/Highlighting-Oddities-tp20351015p21843092.html Sent from the Solr - User mailing list archive at Nabble.com.