RE: Search differences between solr 1.4.0 and 3.6.1

Frederico Azeiteiro Wed, 28 Nov 2012 01:34:35 -0800

I just reload both indexes just to make sure that all definitions are loaded.
On Analysis tool I can see differences, even that the fields are defined on the 
same way:


Query Analyser for 3.6.1
org.apache.solr.analysis.WordDelimiterFilterFactory {protected=protwords.txt, 
splitOnCaseChange=1, generateNumberParts=0, catenateWords=0, 
luceneMatchVersion=LUCENE_36, generateWordParts=1, catenateAll=0, 
catenateNumbers=0}
term text: GAMES

Query Analyser for 1.4.0
org.apache.solr.analysis.WordDelimiterFilterFactory {protected=protwords.txt, 
splitOnCaseChange=1, generateNumberParts=0, catenateWords=0, 
generateWordParts=1, catenateAll=0, catenateNumbers=0}
term text: GAMES | 12
 
The "12" is lost on query for 3.6.1.
The only diference I can see on the field definition is the 
"luceneMatchVersion=LUCENE_36"... Could it cause this issue?

Thank you.
Frederico

-----Mensagem original-----
De: Erick Erickson [mailto:erickerick...@gmail.com] 
Enviada: terça-feira, 27 de Novembro de 2012 12:26
Para: solr-user@lucene.apache.org
Assunto: Re: Search differences between solr 1.4.0 and 3.6.1

Using the definition you provided, I don't get the same output. Are you sure 
you are doing what you think? The generateNumberParts=0 keeps the '12'
from making it through the filter in 1.4 and 3.6 so I suspect you're not quite 
doing something the same way in both.

Perhaps looking at index tokenization in one and query in the other?

Best
Erick


On Mon, Nov 26, 2012 at 9:06 AM, Frederico Azeiteiro < 
frederico.azeite...@cision.com> wrote:

> Hi,
>
>
>
> While updating our SOLR to 3.6.1 I noticed some results differences 
> when using search strings with letters+number.
>
> For a text field defined as:
>
> <analyzer type="index">
> <http://cbrsrvmtr04:8983/solr/WISE/admin/file/?file=schema.xml>
>
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>
> <charFilter class="solr.MappingCharFilterFactory"
> mapping="mapping-ISOLatin1Accent.txt"/>
>
> <filter class="solr.WordDelimiterFilterFactory"
> protected="protwords.txt" splitOnCaseChange="1" catenateAll="0"
> catenateNumbers="1" catenateWords="1" generateNumberParts="0"
> generateWordParts="1" stemEnglishPossessive="0"/>
>
> </analyzer>
>
> <analyzer type="query">
> <http://cbrsrvmtr04:8983/solr/WISE/admin/file/?file=schema.xml>
>
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>
> <filter class="solr.SynonymFilterFactory" ignoreCase="true"
> expand="true" synonyms="synonyms.txt"/>
>
> <filter class="solr.WordDelimiterFilterFactory"
> protected="protwords.txt" splitOnCaseChange="1" catenateAll="0"
> catenateNumbers="0" catenateWords="0" generateNumberParts="0"
> generateWordParts="1"/>
>
> </analyzer>
>
>
>
> Searching for string GAMES12 returns a lot of results on 3.6.1 that 
> are not returned on 1.4.0.
>
>
>
> It looks like WordDelimiterFilterFactory  is acting different for 
> 3.6.1, the numeric part of the keyword is being ignored and the search 
> is performed using only GAMES.
>
>
>
> Analisys returns for 1.4.0:
>
> org.apache.solr.analysis.WordDelimiterFilterFactory
> {protected=protwords.txt, splitOnCaseChange=1, generateNumberParts=0, 
> catenateWords=0, generateWordParts=1, catenateAll=0, 
> catenateNumbers=0}
>
> term position
>
> 1
>
> 2
>
> term text
>
> GAMES
>
> 12
>
> term type
>
> word
>
> word
>
> source start,end
>
> 0,5
>
> 5,7
>
> payload
>
>
>
>
>
> AND for 3.6.1
>
>
>
> org.apache.solr.analysis.WordDelimiterFilterFactory
> {protected=protwords.txt, splitOnCaseChange=1, generateNumberParts=0, 
> catenateWords=0, luceneMatchVersion=LUCENE_36, generateWordParts=1, 
> catenateAll=0, catenateNumbers=0}
>
> position
>
> 1
>
> term text
>
> GAMES
>
> startOffset
>
> 0
>
> endOffset
>
> 5
>
> type
>
> word
>
> positionLength
>
> 1
>
>
>
>
>
> Is this something that can be modified/fixed to return the same results?
>
>
>
> Thank you.
>
>
>
> Regards,
>
> Frederico
>
>
>
>
>
>

RE: Search differences between solr 1.4.0 and 3.6.1

Reply via email to