On Monday 01 June 2009 16:50, Sam Michaels wrote: > So the fix for this problem would be > > 1. Stop using WordDelimiterFilter for queries (what is the alternative) OR > 2. Not allow any search strings without any alphanumeric characters..
We ran into this same problem while replacing all characters using a PatternReplaceFilter. I've been working around this bug by using a LengthFilter to filter out tokens of zero length. .øs > Yonik Seeley-2 wrote: > > OK, here's the deal: > > > > <str name="rawquerystring">-features:foo features:(\...@#$%\^&\*\(\))</str> > > <str name="querystring">-features:foo features:(\...@#$%\^&\*\(\))</str> > > <str name="parsedquery">-features:foo</str> > > <str name="parsedquery_toString">-features:foo</str> > > > > The text analysis is throwing away non alphanumeric chars (probably > > the WordDelimiterFilter). The Lucene (and Solr) query parser throws > > away term queries when the token is zero length (after analysis). > > Solr then interprets the left over "-features:foo" as "all documents > > not containing foo in the features field", so you get a bunch of > > matches. > > > > -Yonik > > http://www.lucidimagination.com > > > > On Mon, Jun 1, 2009 at 10:15 AM, Sam Michaels <mas...@yahoo.com> wrote: > >> Walter, > >> > >> The analysis link does not produce any matches for either @ or > >> !...@#$%^&*() strings when I try to match against bathing. I'm worried that > >> this might be > >> the symptom of another problem (which has not revealed itself yet) and > >> want > >> to get to the bottom of this... > >> > >> Thank you. > >> sm > >> > >> Walter Underwood wrote: > >>> Use the [analysis] link on the Solr admin UI to get more info on > >>> how this is being interpreted. > >>> > >>> However, I am curious about why this is important. Do users enter > >>> this query often? If not, maybe it is not something to spend time on. > >>> > >>> wunder > >>> > >>> On 5/31/09 2:56 PM, "Sam Michaels" <mas...@yahoo.com> wrote: > >>>> Here is the output from the debug query when I'm trying to match the > >>>> String @ > >>>> against Bathing (should not match) > >>>> > >>>> <str name="GLOM-1"> > >>>> 3.2689073 = (MATCH) weight(activity_type:NAME in 0), product of: > >>>> 0.99999994 = queryWeight(activity_type:NAME), product of: > >>>> 3.2689075 = idf(docFreq=153, numDocs=1489) > >>>> 0.30591258 = queryNorm > >>>> 3.2689075 = (MATCH) fieldWeight(activity_type:NAME in 0), product > >>>> of: 1.0 = tf(termFreq(activity_type:NAME)=1) > >>>> 3.2689075 = idf(docFreq=153, numDocs=1489) > >>>> 1.0 = fieldNorm(field=activity_type, doc=0) > >>>> </str> > >>>> > >>>> Looks like the AND clause in the search string is ignored... > >>>> > >>>> SM. > >>>> > >>>> ryantxu wrote: > >>>>> two key things to try (for anyone ever wondering why a query matches > >>>>> documents) > >>>>> > >>>>> 1. add &debugQuery=true and look at the explain text below -- > >>>>> anything that contributed to the score is listed there > >>>>> 2. check /admin/analysis.jsp -- this will let you see how analyzers > >>>>> break text up into tokens. > >>>>> > >>>>> Not sure off hand, but I'm guessing the WordDelimiterFilterFactory > >>>>> has something to do with it... > >>>>> > >>>>> > >>>>> On Sat, May 30, 2009 at 5:59 PM, Sam Michaels <mas...@yahoo.com> > >>>>> > >>>>> wrote: > >>>>>> Hi, > >>>>>> > >>>>>> I'm running Solr 1.3/Java 1.6. > >>>>>> > >>>>>> When I run a query like - (activity_type:NAME) AND > >>>>>> title:(\...@#$%\^&\*\(\)) > >>>>>> all the documents are returned even though there is not a single > >>>>>> match. > >>>>>> There is no title that matches the string (which has been escaped). > >>>>>> > >>>>>> My document structure is as follows > >>>>>> > >>>>>> <doc> > >>>>>> <str name="activity_type">NAME</str> > >>>>>> <str name="title">Bathing</str> > >>>>>> .... > >>>>>> </doc> > >>>>>> > >>>>>> > >>>>>> The title field is of type text_title which is described below. > >>>>>> > >>>>>> <fieldType name="text_title" class="solr.TextField" > >>>>>> positionIncrementGap="100"> > >>>>>> <analyzer type="index"> > >>>>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/> > >>>>>> <!-- in this example, we will only use synonyms at query time > >>>>>> <filter class="solr.SynonymFilterFactory" > >>>>>> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/> > >>>>>> --> > >>>>>> <filter class="solr.WordDelimiterFilterFactory" > >>>>>> generateWordParts="1" generateNumberParts="1" catenateWords="1" > >>>>>> catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"/> > >>>>>> <filter class="solr.LowerCaseFilterFactory"/> > >>>>>> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > >>>>>> </analyzer> > >>>>>> <analyzer type="query"> > >>>>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/> > >>>>>> <filter class="solr.SynonymFilterFactory" > >>>>>> synonyms="synonyms.txt" > >>>>>> ignoreCase="true" expand="true"/> > >>>>>> <filter class="solr.WordDelimiterFilterFactory" > >>>>>> generateWordParts="1" generateNumberParts="1" catenateWords="1" > >>>>>> catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"/> > >>>>>> <filter class="solr.LowerCaseFilterFactory"/> > >>>>>> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > >>>>>> > >>>>>> </analyzer> > >>>>>> </fieldType> > >>>>>> > >>>>>> When I run the query against Luke, no results are returned. Any > >>>>>> suggestions > >>>>>> are appreciated. > >>>>>> > >>>>>> > >>>>>> -- > >>>>>> View this message in context: > >>>>>> http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all- > >>>>>>document s-are-matched-incorrectly-tp23797731p23797731.html > >>>>>> Sent from the Solr - User mailing list archive at Nabble.com. > >> > >> -- > >> View this message in context: > >> http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-docu > >>ments-are-matched-incorrectly-tp23797731p23815688.html Sent from the Solr > >> - User mailing list archive at Nabble.com. -- Øystein Steimler, Produktans, EasyConnect AS - http://opplysning1890.no oystein.steim...@easyconnect.no - GPG: 0x784a7dea - Mob: 90010882
pgp1kCVhRlKUG.pgp
Description: PGP signature