Pierre, Merci beaucoup Pierre. :)
You saved me a lot of time and headache. >As I understand WordDelimiterFilter : > >"0176 R3 1.5 TO" should we tokenized with tokens "R3" overlapping with "R" > and "3", and "15" overlapping with "1" and "5" > >This parmeters are set to 0 for query, but having them set to 1 should not > correct your problem unless you search for "R3 1.5". > You are correct. > > >I think you have to either > > - set this parameters to 0 in index, but your query won't match anymore > > - wait for correction to be released in a new solr version, > > - use solr trunk, > > - or backport the modifications in the lucene-highlighter version you > use. > For what I need, using 0 for the index should do the trick. I did not want my query to match. > >I did a backport for solr 1.4.1 since I won't move to 3.0 until some time, > so please ask if you have question about how to do this. > I don't anticipate the need for a backport but is there any wiki out there that outline this process? Regards, Phong > > -----Message d'origine----- > De : Phong Dais [mailto:phong.gd...@gmail.com] > Envoyé : jeudi 12 mai 2011 20:06 > À : solr-user@lucene.apache.org > Objet : Re: Document match with no highlight > > Hi, > > I read the link provided and I'll need some time to digest what it is > saying. > > Here's my "text" fieldtype. > > <fieldtype name="text" class="solr.TextField" positionIncrementGap="100"> > <analyzer type="index"> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords.txt" enablePositionIncrements="true"/> > <filter class="solr.WordDelimeterFilterFactory" generateWordParts="1" > generateNumberParts="1" > catenateWords="1" catenateNumbers="1" catenateAll="0" > splitOnCaseChange="1"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.SnowballPorterFilterFactory" language="English" > protected="protwords.txt"/> > </analyzer> > <analyzer type="query"> > <tokenizer class="WhitespaceTokenizerFactory"/> > <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" > ignoreCase="true" expand="true"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords.txt" enablePositionIncrements="true"/> > <filter class="solr.WordDelimeterFilterFactory" generateWordParts="1" > generateNumberParts="1" > catenateWords="0" catenateNumbers="0" catenateAll="0" > splitOnCaseChange="1"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.SnowballPorterFilterFactory" language="English" > protected="protwords.txt"/> > </analyzer> > <fieldtype> > Also, I figured out what value in DOC_TEXT cause this issue to occur. > With a DOC_TEXT of (without the quotes): > "0176 R3 1.5 TO " > > Searching for "3 1 15" returns a match with "empty" highlight. > Searching for "3 1 15"~1 returns a match with highlight. > > Can anyone see anything that I'm missing? > > Thanks, > P. > > > On Thu, May 12, 2011 at 12:27 PM, Pierre GOSSE <pierre.go...@arisem.com > >wrote: > > > > Since you're using the standard "text" field, this should NOT be you're > > case. > > > > Sorry, for the missing NOT in previous phrase. You should have the same > > issue given what you said, but still, it sound very similar. > > > > Are you sure your fieldtype "text" has nothing special ? a tokenizer or > > filter that could add some token in your indexed text but not in your > query, > > like for example a WordDelimiter present in <index> and not <query> ? > > > > Pierre > > > > -----Message d'origine----- > > De : Pierre GOSSE [mailto:pierre.go...@arisem.com] > > Envoyé : jeudi 12 mai 2011 18:21 > > À : solr-user@lucene.apache.org > > Objet : RE: Document match with no highlight > > > > > In fact if I did "3 1 15"~1 I do get snipet also. > > > > Strange, I had a very similar problem, but with overlapping tokens. Since > > you're using the standard "text" field, this should be you're case. > > > > Maybe you could have a look at this issue, since it sound very familiar > to > > me : > > https://issues.apache.org/jira/browse/LUCENE-3087 > > > > Pierre > > > > -----Message d'origine----- > > De : Phong Dais [mailto:phong.gd...@gmail.com] > > Envoyé : jeudi 12 mai 2011 17:26 > > À : solr-user@lucene.apache.org > > Objet : Re: Document match with no highlight > > > > Hi, > > > > <field name="DOC_TEXT" type="text" indexed="true" stored="true"/> > > > > The type "text" is the default one that came with the default solr 1.4 > > install w.o any modifications. > > > > If I remove the quotes I do get snipets. In fact if I did "3 1 15"~1 I > do > > get snipet also. > > > > Hope that helps. > > > > P. > > > > On Thu, May 12, 2011 at 9:09 AM, Ahmet Arslan <iori...@yahoo.com> wrote: > > > > > > URL: > > > > > > > > > > http://localhost:8983/solr/select?indent=on&version=2.2&q=DOC_TEXT%3A%223+1+15%22&fq=&start=0 > > > > > > > > > > &rows=10&fl=DOC_TEXT%2Cscore&qt=standard&wt=standard&debugQuery=on&explainOther=&hl=on&hl.fl=DOC_TEXT&hl.maxAnalyzedChars=-1 > > > > > > > > XML: > > > > <?xml version="1.0" encoding="UTF-8"?> > > > > <response> > > > > <lst name="responseHeader"> > > > > <int name="status">0</int> > > > > <int name="QTime">19</int> > > > > <lst name="params"> > > > > <str name="explainOther"/> > > > > <str > > > > name="indent">on</str> > > > > <str > > > > name="hl.fl">DOC_TEXT</str> > > > > <str > > > > name="wt">standard</str> > > > > <str > > > > name="hl.maxAnalyzedChars">-1</str> > > > > <str name="hl">on</str> > > > > <str name="rows">10</str> > > > > <str > > > > name="version">2.2</str> > > > > <str > > > > name="debugQuery">on</str> > > > > <str > > > > name="fl">DOC_TEXT,score</str> > > > > <str name="start">0</str> > > > > <str name="q">DOC_TEXT:"3 1 > > > > 15"</str> > > > > <str > > > > name="qt">standard</str> > > > > <str name="fq"/> > > > > </lst> > > > > </lst> > > > > <result name="response" numFound='1" start="0" > > > > maxScore="0.035959315"> > > > > <doc> > > > > <float > > > > name="score">0.035959315</float> > > > > <arr name="DOC_TEXT"><str> > > > > ... </str></arr> > > > > <doc> > > > > </result> > > > > <lst name="highlighting"> > > > > <lst name="123456"/> > > > > </lst> > > > > <lst name="debug"> > > > > <str name="rawquerystring">DOC_TEXT:"3 > > > > 1 15"</str> > > > > <str name="querystring">DOC_TEXT:"3 1 > > > > 15"</str> > > > > <str > > > > name="parsedquery">PhraseQuery(DOC_TEXT:"3 1 > > > > 15)"</str> > > > > <str > > > > name="parsedquery_toString">DOC_TEXT:"3 1 > > > > 15"</str> > > > > <lst name="explain"> > > > > <str name="123456"> > > > > 0.035959315 = > > > > fieldWeight(DOC_TEXT:"3 1 15" in 0), product of: 1.0 = > > > > tf(phraseFreq=1.0) > > > > 0.92055845 = idf(DOC_TEXT: 3=1 > > > > 1=1 15=1) > > > > 0.0390625 = > > > > fieldNorm(field=DOC_TEXT, doc=0) > > > > </str> > > > > </lst> > > > > <str name="QParser">LuceneQParser</str> > > > > <arr name="filter_queries"> > > > > <str/> > > > > </arr> > > > > <arr name="parsed_filter_queries"/> > > > > <lst name="timing"> > > > > ... > > > > </lst> > > > > </response> > > > > > > > > > Nothing looks suspicious. > > > > > > Can you provide two things more; > > > fieldType of DOC_TEXT > > > and > > > field definition of DOC_TEXT. > > > > > > Also do you get snippet from the same doc, when you remove quotes from > > your > > > query? > > > > > > > > >