Re: When searching for !@#$%^&*() all documents are matched incorrectly

Øystein F. Steimler Mon, 08 Jun 2009 23:53:58 -0700

On Monday 01 June 2009 16:50, Sam Michaels wrote:
> So the fix for this problem would be
>
> 1. Stop using WordDelimiterFilter for queries (what is the alternative) OR
> 2. Not allow any search strings without any alphanumeric characters..


We ran into this same problem while replacing all characters using a 
PatternReplaceFilter. I've been working around this bug by using a 
LengthFilter to filter out tokens of zero length.

.øs

> Yonik Seeley-2 wrote:
> > OK, here's the deal:
> >
> > <str name="rawquerystring">-features:foo features:(\...@#$%\^&\*\(\))</str>
> > <str name="querystring">-features:foo features:(\...@#$%\^&\*\(\))</str>
> > <str name="parsedquery">-features:foo</str>
> > <str name="parsedquery_toString">-features:foo</str>
> >
> > The text analysis is throwing away non alphanumeric chars (probably
> > the WordDelimiterFilter).  The Lucene (and Solr) query parser throws
> > away term queries when the token is zero length (after analysis).
> > Solr then interprets the left over "-features:foo" as "all documents
> > not containing foo in the features field", so you get a bunch of
> > matches.
> >
> > -Yonik
> > http://www.lucidimagination.com
> >
> > On Mon, Jun 1, 2009 at 10:15 AM, Sam Michaels <mas...@yahoo.com> wrote:
> >> Walter,
> >>
> >> The analysis link does not produce any matches for either @ or
> >> !...@#$%^&*() strings when I try to match against bathing. I'm worried that
> >> this might be
> >> the symptom of another problem (which has not revealed itself yet) and
> >> want
> >> to get to the bottom of this...
> >>
> >> Thank you.
> >> sm
> >>
> >> Walter Underwood wrote:
> >>> Use the [analysis] link on the Solr admin UI to get more info on
> >>> how this is being interpreted.
> >>>
> >>> However, I am curious about why this is important. Do users enter
> >>> this query often? If not, maybe it is not something to spend time on.
> >>>
> >>> wunder
> >>>
> >>> On 5/31/09 2:56 PM, "Sam Michaels" <mas...@yahoo.com> wrote:
> >>>> Here is the output from the debug query when I'm trying to match the
> >>>> String @
> >>>> against Bathing (should not match)
> >>>>
> >>>> <str name="GLOM-1">
> >>>> 3.2689073 = (MATCH) weight(activity_type:NAME in 0), product of:
> >>>>   0.99999994 = queryWeight(activity_type:NAME), product of:
> >>>>     3.2689075 = idf(docFreq=153, numDocs=1489)
> >>>>     0.30591258 = queryNorm
> >>>>   3.2689075 = (MATCH) fieldWeight(activity_type:NAME in 0), product
> >>>> of: 1.0 = tf(termFreq(activity_type:NAME)=1)
> >>>>     3.2689075 = idf(docFreq=153, numDocs=1489)
> >>>>     1.0 = fieldNorm(field=activity_type, doc=0)
> >>>> </str>
> >>>>
> >>>> Looks like the AND clause in the search string is ignored...
> >>>>
> >>>> SM.
> >>>>
> >>>> ryantxu wrote:
> >>>>> two key things to try (for anyone ever wondering why a query matches
> >>>>> documents)
> >>>>>
> >>>>> 1.  add &debugQuery=true and look at the explain text below --
> >>>>> anything that contributed to the score is listed there
> >>>>> 2.  check /admin/analysis.jsp -- this will let you see how analyzers
> >>>>> break text up into tokens.
> >>>>>
> >>>>> Not sure off hand, but I'm guessing the WordDelimiterFilterFactory
> >>>>> has something to do with it...
> >>>>>
> >>>>>
> >>>>> On Sat, May 30, 2009 at 5:59 PM, Sam Michaels <mas...@yahoo.com>
> >>>>>
> >>>>> wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> I'm running Solr 1.3/Java 1.6.
> >>>>>>
> >>>>>> When I run a query like  - (activity_type:NAME) AND
> >>>>>> title:(\...@#$%\^&\*\(\))
> >>>>>> all the documents are returned even though there is not a single
> >>>>>> match.
> >>>>>> There is no title that matches the string (which has been escaped).
> >>>>>>
> >>>>>> My document structure is as follows
> >>>>>>
> >>>>>> <doc>
> >>>>>> <str name="activity_type">NAME</str>
> >>>>>> <str name="title">Bathing</str>
> >>>>>> ....
> >>>>>> </doc>
> >>>>>>
> >>>>>>
> >>>>>> The title field is of type text_title which is described below.
> >>>>>>
> >>>>>> <fieldType name="text_title" class="solr.TextField"
> >>>>>> positionIncrementGap="100">
> >>>>>>      <analyzer type="index">
> >>>>>>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >>>>>>        <!-- in this example, we will only use synonyms at query time
> >>>>>>        <filter class="solr.SynonymFilterFactory"
> >>>>>> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
> >>>>>>        -->
> >>>>>>        <filter class="solr.WordDelimiterFilterFactory"
> >>>>>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> >>>>>> catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"/>
> >>>>>>        <filter class="solr.LowerCaseFilterFactory"/>
> >>>>>>        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
> >>>>>>      </analyzer>
> >>>>>>      <analyzer type="query">
> >>>>>>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >>>>>>        <filter class="solr.SynonymFilterFactory"
> >>>>>> synonyms="synonyms.txt"
> >>>>>> ignoreCase="true" expand="true"/>
> >>>>>>        <filter class="solr.WordDelimiterFilterFactory"
> >>>>>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> >>>>>> catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"/>
> >>>>>>        <filter class="solr.LowerCaseFilterFactory"/>
> >>>>>>        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
> >>>>>>
> >>>>>>      </analyzer>
> >>>>>>    </fieldType>
> >>>>>>
> >>>>>> When I run the query against Luke, no results are returned. Any
> >>>>>> suggestions
> >>>>>> are appreciated.
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> View this message in context:
> >>>>>> http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-
> >>>>>>document s-are-matched-incorrectly-tp23797731p23797731.html
> >>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
> >> --
> >> View this message in context:
> >> http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-docu
> >>ments-are-matched-incorrectly-tp23797731p23815688.html Sent from the Solr
> >> - User mailing list archive at Nabble.com.

-- 
Øystein Steimler, Produktans, EasyConnect AS  -  http://opplysning1890.no
oystein.steim...@easyconnect.no  - GPG: 0x784a7dea - Mob: 90010882

pgp1kCVhRlKUG.pgp
Description: PGP signature

Re: When searching for !@#$%^&*() all documents are matched incorrectly

Reply via email to