Re: Document match with no highlight

Phong Dais Fri, 13 May 2011 03:17:48 -0700

Pierre,

Merci beaucoup Pierre. :)


You saved me a lot of time and headache.

>As I understand WordDelimiterFilter :
> >"0176 R3 1.5 TO" should we tokenized with tokens "R3" overlapping with "R"
> and "3", and "15" overlapping with "1" and "5"
>

>This parmeters are set to 0 for query, but having them set to 1 should not
> correct your problem unless you search for "R3 1.5".
>

You are correct.


>
> >I think you have to either
> > - set this parameters to 0 in index, but your query won't match anymore
> > - wait for correction to be released in a new solr version,
> > - use solr trunk,
> > - or backport the modifications in the lucene-highlighter version you
> use.
>

For what I need, using 0 for the index should do the trick.  I did not want
my query to match.


> >I did a backport for solr 1.4.1 since I won't move to 3.0 until some time,
> so please ask if you have question about how to do this.
>

I don't anticipate the need for a backport but is there any wiki out there
that outline this process?

Regards,
Phong


>
> -----Message d'origine-----
> De : Phong Dais [mailto:phong.gd...@gmail.com]
> Envoyé : jeudi 12 mai 2011 20:06
> À : solr-user@lucene.apache.org
> Objet : Re: Document match with no highlight
>
> Hi,
>
> I read the link provided and I'll need some time to digest what it is
> saying.
>
> Here's my "text" fieldtype.
>
> <fieldtype name="text" class="solr.TextField" positionIncrementGap="100">
>  <analyzer type="index">
>    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>    <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true"/>
>    <filter class="solr.WordDelimeterFilterFactory" generateWordParts="1"
> generateNumberParts="1"
>      catenateWords="1" catenateNumbers="1" catenateAll="0"
> splitOnCaseChange="1"/>
>    <filter class="solr.LowerCaseFilterFactory"/>
>    <filter class="solr.SnowballPorterFilterFactory" language="English"
> protected="protwords.txt"/>
>  </analyzer>
>  <analyzer type="query">
>    <tokenizer class="WhitespaceTokenizerFactory"/>
>    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>    <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true"/>
>    <filter class="solr.WordDelimeterFilterFactory" generateWordParts="1"
> generateNumberParts="1"
>      catenateWords="0" catenateNumbers="0" catenateAll="0"
> splitOnCaseChange="1"/>
>    <filter class="solr.LowerCaseFilterFactory"/>
>    <filter class="solr.SnowballPorterFilterFactory" language="English"
> protected="protwords.txt"/>
>  </analyzer>
> <fieldtype>
> Also, I figured out what value in DOC_TEXT cause this issue to occur.
> With a DOC_TEXT of (without the quotes):
> "0176 R3 1.5 TO "
>
> Searching for "3 1 15" returns a match with "empty" highlight.
> Searching for "3 1 15"~1 returns a match with highlight.
>
> Can anyone see anything that I'm missing?
>
> Thanks,
> P.
>
>
> On Thu, May 12, 2011 at 12:27 PM, Pierre GOSSE <pierre.go...@arisem.com
> >wrote:
>
> > > Since you're using the standard "text" field, this should NOT be you're
> > case.
> >
> > Sorry, for the missing NOT in previous phrase. You should have the same
> > issue given what you said, but still, it sound very similar.
> >
> > Are you sure your fieldtype "text" has nothing special ? a tokenizer or
> > filter that could add some token in your indexed text but not in your
> query,
> > like for example a WordDelimiter present in <index> and not <query> ?
> >
> > Pierre
> >
> > -----Message d'origine-----
> > De : Pierre GOSSE [mailto:pierre.go...@arisem.com]
> > Envoyé : jeudi 12 mai 2011 18:21
> > À : solr-user@lucene.apache.org
> > Objet : RE: Document match with no highlight
> >
> > > In fact if I did "3 1 15"~1 I do get snipet also.
> >
> > Strange, I had a very similar problem, but with overlapping tokens. Since
> > you're using the standard "text" field, this should be you're case.
> >
> > Maybe you could have a look at this issue, since it sound very familiar
> to
> > me :
> > https://issues.apache.org/jira/browse/LUCENE-3087
> >
> > Pierre
> >
> > -----Message d'origine-----
> > De : Phong Dais [mailto:phong.gd...@gmail.com]
> > Envoyé : jeudi 12 mai 2011 17:26
> > À : solr-user@lucene.apache.org
> > Objet : Re: Document match with no highlight
> >
> > Hi,
> >
> > <field name="DOC_TEXT" type="text" indexed="true" stored="true"/>
> >
> > The type "text" is the default one that came with the default solr 1.4
> > install w.o any modifications.
> >
> > If I remove the quotes I do get snipets.  In fact if I did "3 1 15"~1 I
> do
> > get snipet also.
> >
> > Hope that helps.
> >
> > P.
> >
> > On Thu, May 12, 2011 at 9:09 AM, Ahmet Arslan <iori...@yahoo.com> wrote:
> >
> > >  > URL:
> > > >
> > >
> >
> http://localhost:8983/solr/select?indent=on&version=2.2&q=DOC_TEXT%3A%223+1+15%22&fq=&start=0
> > > >
> > >
> >
> &rows=10&fl=DOC_TEXT%2Cscore&qt=standard&wt=standard&debugQuery=on&explainOther=&hl=on&hl.fl=DOC_TEXT&hl.maxAnalyzedChars=-1
> > > >
> > > > XML:
> > > > <?xml version="1.0" encoding="UTF-8"?>
> > > > <response>
> > > >   <lst name="responseHeader">
> > > >     <int name="status">0</int>
> > > >     <int name="QTime">19</int>
> > > >     <lst name="params">
> > > >       <str name="explainOther"/>
> > > >       <str
> > > > name="indent">on</str>
> > > >       <str
> > > > name="hl.fl">DOC_TEXT</str>
> > > >       <str
> > > > name="wt">standard</str>
> > > >       <str
> > > > name="hl.maxAnalyzedChars">-1</str>
> > > >       <str name="hl">on</str>
> > > >       <str name="rows">10</str>
> > > >       <str
> > > > name="version">2.2</str>
> > > >       <str
> > > > name="debugQuery">on</str>
> > > >       <str
> > > > name="fl">DOC_TEXT,score</str>
> > > >       <str name="start">0</str>
> > > >       <str name="q">DOC_TEXT:"3 1
> > > > 15"</str>
> > > >       <str
> > > > name="qt">standard</str>
> > > >       <str name="fq"/>
> > > >     </lst>
> > > >   </lst>
> > > >   <result name="response" numFound='1" start="0"
> > > > maxScore="0.035959315">
> > > >     <doc>
> > > >       <float
> > > > name="score">0.035959315</float>
> > > >       <arr name="DOC_TEXT"><str>
> > > > ... </str></arr>
> > > >     <doc>
> > > >   </result>
> > > >   <lst name="highlighting">
> > > >     <lst name="123456"/>
> > > >   </lst>
> > > >   <lst name="debug">
> > > >     <str name="rawquerystring">DOC_TEXT:"3
> > > > 1 15"</str>
> > > >     <str name="querystring">DOC_TEXT:"3 1
> > > > 15"</str>
> > > >     <str
> > > > name="parsedquery">PhraseQuery(DOC_TEXT:"3 1
> > > > 15)"</str>
> > > >     <str
> > > > name="parsedquery_toString">DOC_TEXT:"3 1
> > > > 15"</str>
> > > >     <lst name="explain">
> > > >       <str name="123456">
> > > >         0.035959315 =
> > > > fieldWeight(DOC_TEXT:"3 1 15" in 0), product of: 1.0 =
> > > > tf(phraseFreq=1.0)
> > > >         0.92055845 = idf(DOC_TEXT: 3=1
> > > > 1=1 15=1)
> > > >         0.0390625 =
> > > > fieldNorm(field=DOC_TEXT, doc=0)
> > > >     </str>
> > > >   </lst>
> > > >   <str name="QParser">LuceneQParser</str>
> > > >   <arr name="filter_queries">
> > > >     <str/>
> > > >   </arr>
> > > >   <arr name="parsed_filter_queries"/>
> > > >   <lst name="timing">
> > > >     ...
> > > >   </lst>
> > > > </response>
> > >
> > >
> > > Nothing looks suspicious.
> > >
> > > Can you provide two things more;
> > > fieldType of DOC_TEXT
> > > and
> > > field definition of DOC_TEXT.
> > >
> > > Also do you get snippet from the same doc, when you remove quotes from
> > your
> > > query?
> > >
> > >
> >
>

Re: Document match with no highlight

Reply via email to