Re: Weird: Solr Search result and Analysis Result not match?

Erick Erickson Wed, 09 Nov 2011 10:15:45 -0800

Oh, one more thing. I wasn't suggesting that you *remove*
WordDelimiterFilterFactory from the query chain, just
that you should be more selective about the options. Look
at the differences in the options in the example schema for
a place to start....


Best
Erick

On Wed, Nov 9, 2011 at 12:33 PM, Erick Erickson <erickerick...@gmail.com> wrote:
> Regarding <1>. Take a look at admin/analysis and see the tokenization just
> to check.
>
> Oh, and one more thing...
> putting <LowerCaseFilterFactory> in front of <WordDelimiterFilterFactory>
> kind of defeats the purpose of WordDelimiterFilterFactory. One of the
> things WDDF does is split on case change and you're removing the case
> changes before WDDF gets hold of it.
>
> Best
> Erick
>
> On Tue, Nov 8, 2011 at 9:40 PM, Ellery Leung <elleryle...@be-o.com> wrote:
>> Thanks Erick, here are my responses:
>>
>> 1. Yes.  What I want to achieve is that when index is filtered with 
>> EdgeNgram, and a query that is not filtered in that way, I can do search on 
>> partial string.
>> 2. Good suggestion, will test it.
>> 3. ok
>> 4. Thank you
>> 5/6. Will remove the synonyms and word delimiterfilterfactory in query
>> 7. will look at that using Luke.  By the way, it is the first time I saw 
>> that there is a tool for that.  Thank you.
>> 8. Yes.
>>
>> Will check that again, thank you.
>>
>> -----Original Message-----
>> From: Erick Erickson [mailto:erickerick...@gmail.com]
>> Sent: 2011年11月8日 9:52 下午
>> To: solr-user@lucene.apache.org; elleryle...@be-o.com
>> Subject: Re: Weird: Solr Search result and Analysis Result not match?
>>
>> Several things:
>>
>> 1> You don't have EdgeNGramFilterFactory in your query analysis chain,
>> is this intentional?
>> 2> You have a LOT of stuff going on here, you might try making your
>> analysis chain simpler and
>>     adding stuff back in until you see the error. Don't forget to re-index!
>> 3> Analysis doesn't take into account query *parsing*, so it's
>> possible to get a false sense of
>>     assurance when the analysis page matches your expectations.
>> 4> Even though nothing jumps out at me except the Edge.... factory,
>> nice job of including
>>     information.
>> 5> It's unusual to expand synonyms both at query and index time,
>> usually one or the
>>     other with index time preferred.
>> 6> Same with WordDelimiterFilterFactory. If you put all the variants
>> in the index, you don't
>>     need to put all the variants in the query and vice-versa.
>> 7> Take a look at your actual contents, perhaps using Luke to insure
>> that what you expect
>>      to be in your index actually is.
>> 8> You did re-index after your latest changes to your schema, right <G>?
>>
>> All of this is a way of saying that I don't quite see what the problem
>> is, but at least there are
>> some avenues to explore.
>>
>> Best
>> Erick
>>
>> On Mon, Nov 7, 2011 at 9:29 PM, Ellery Leung <elleryle...@be-o.com> wrote:
>>> Hi all.
>>>
>>>
>>>
>>> I am using Solr 3.4 under Win 7.
>>>
>>>
>>>
>>> In schema there is a multivalue field indexed in this way:
>>>
>>> ==========================
>>>
>>> Schema:
>>>
>>> ==========================
>>>
>>> <field name="myEvent" type="myCustomText" multiValued="true" indexed="true"
>>> stored="true" omitNorms="true"/>
>>>
>>>
>>>
>>> <fieldType name="myCustomText" class="solr.TextField"
>>> positionIncrementGap="100">
>>>
>>>        <analyzer type="index">
>>>
>>>                <charFilter class="solr.MappingCharFilterFactory"
>>> mapping="../../filters/filter-mappings.txt"/>
>>>
>>>                <charFilter class="solr.HTMLStripCharFilterFactory"/>
>>>
>>>                <tokenizer class="solr.StandardTokenizerFactory"/>
>>>
>>>                <filter class="solr.TrimFilterFactory"/>
>>>
>>>                <filter class="solr.LowerCaseFilterFactory"/>
>>>
>>>                <filter class="solr.SynonymFilterFactory"
>>> synonyms="../../filters/filter-synonyms.txt" ignoreCase="true"
>>> expand="true"/>
>>>
>>>                <filter class="solr.ASCIIFoldingFilterFactory"/>
>>>
>>>                <filter class="solr.WordDelimiterFilterFactory"
>>> splitOnCaseChange="1" splitOnNumerics="1" stemEnglishPossessive="1"
>>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>>> catenateNumbers="1" catenateAll="0" preserveOriginal="1"/>
>>>
>>>                <filter class="solr.PhoneticFilterFactory"
>>> encoder="DoubleMetaphone" inject="true"/>
>>>
>>>                <filter class="solr.PorterStemFilterFactory"/>
>>>
>>>                <filter class="solr.EdgeNGramFilterFactory" minGramSize="1"
>>> maxGramSize="50" side="front"/>
>>>
>>>                <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>>>
>>>        </analyzer>
>>>
>>>        <analyzer type="query">
>>>
>>>                <charFilter class="solr.MappingCharFilterFactory"
>>> mapping="../../filters/filter-mappings.txt"/>
>>>
>>>                <charFilter class="solr.HTMLStripCharFilterFactory"/>
>>>
>>>                <tokenizer class="solr.StandardTokenizerFactory"/>
>>>
>>>                <filter class="solr.TrimFilterFactory"/>
>>>
>>>                <filter class="solr.LowerCaseFilterFactory"/>
>>>
>>>                <filter class="solr.SynonymFilterFactory"
>>> synonyms="../../filters/filter-synonyms.txt" ignoreCase="true"
>>> expand="true"/>
>>>
>>>                <filter class="solr.ASCIIFoldingFilterFactory"/>
>>>
>>>                <filter class="solr.WordDelimiterFilterFactory"
>>> splitOnCaseChange="1" splitOnNumerics="1" stemEnglishPossessive="1"
>>> generateWordParts="0" generateNumberParts="1" catenateWords="1"
>>> catenateNumbers="1" catenateAll="0" preserveOriginal="1"/>
>>>
>>>                <filter class="solr.PhoneticFilterFactory"
>>> encoder="DoubleMetaphone"/>
>>>
>>>                <filter class="solr.PorterStemFilterFactory"/>
>>>
>>>                <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>>>
>>>        </analyzer>
>>>
>>> </fieldType>
>>>
>>> ==========================
>>>
>>> Actual index:
>>>
>>> ==========================
>>>
>>> <arr name="myEvent">
>>>
>>> <str>2284e2</str>
>>>
>>> <str>2284e4</str>
>>>
>>> <str>2284e5</str>
>>>
>>> <str>1911e2</str>
>>>
>>> </arr>
>>>
>>>
>>>
>>> ==========================
>>>
>>> Question:
>>>
>>> ==========================
>>>
>>> Now when I do a search like this:
>>>
>>>
>>>
>>> myEvent:1911e2
>>>
>>>
>>>
>>> This should match the 4th item.  Now on "Full Interface", it does not return
>>> any result.  But on "analysis", matches are highlighted.
>>>
>>>
>>>
>>> By using Debug: the parsedquery is:
>>>
>>>
>>>
>>> MultiPhraseQuery(myEvent:"(1911e2 1911) (A e) 2")
>>>
>>>
>>>
>>> Parsedquery_toString:
>>>
>>>
>>>
>>> myEvent:"(1911e2 1911) (A e) 2"
>>>
>>>
>>>
>>> Can anyone please help me on this?
>>>
>>>
>>
>>
>

Re: Weird: Solr Search result and Analysis Result not match?

Reply via email to