Re: fuzzy search issue with PatternTokenizer Factory

meghana Mon, 22 Apr 2013 00:26:12 -0700

Jack, 

the regex will split tokens by anything expect alphabets , numbers, '&' ,
'-' and ns: (where n is number from 0 to 9999, e.g 4323s: )


Lets say for example my text is like below. 

*this is nice* day & sun 53s: is risen. *

Then pattern tokenizer should create tokens as

*this is nice day & sun is risen*

pattern seem to working fine with different text, 

also for fuzzy search *worde~1*, I have checked the results returns for
patterntokenizer factory, having punctuation marks like '*WORDS,*' ,
*WORDED....* , etc... 

One more weird thing is, all the results are in uppercase letters, no
results with lowercase results come. although it does not return all results
of uppercase letters.

but not sure after changing to this fuzzy search not working properly. 


Jack Krupansky-2 wrote
> Give us some examples of tokens that you are expecting that pattern to 
> tokenize. And express the pattern in simple English as well. Some some 
> actual input data.
> 
> I suspect that Solr is working fine - but you may not have precisely 
> specified your pattern. But we don't know what your pattern is supposed to 
> recognize.
> 
> Maybe some of your previous hits had punctuation adjacent to to the terms 
> that your pattern doesn't recognize.
> 
> And use the Solr Admin UI Analysis page to see how your sample input data
> is 
> analyzed.
> w
> One other thing... without a "group", the pattern specifies what delimiter 
> sequence will "split" the rest of the input into tokens. I suspect you 
> didn't mean this.
> 
> -- Jack Krupansky
> 
> -----Original Message----- 
> From: meghana
> Sent: Friday, April 19, 2013 9:01 AM
> To: 

> solr-user@.apache

> Subject: fuzzy search issue with PatternTokenizer Factory
> 
> I m using Solr4.2 , I have changed my text field definition, to use the
> Solr.PatternTokenizerFactory instead of Solr.StandardTokenizerFactory ,
> and
> changed my schema defination as below
> <fieldType name="text_token" class="solr.TextField"
> positionIncrementGap="100">
>       
> <analyzer type="index">
>        
> <tokenizer class="solr.PatternTokenizerFactory"
> pattern="[^a-zA-Z0-9&amp;\-']|\d{0,4}s:" />
>        
> <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="false" />
>         
> <filter class="solr.LowerCaseFilterFactory"/>
>       
> </analyzer>
>       
> <analyzer type="query">
>        
> <tokenizer class="solr.PatternTokenizerFactory"
> pattern="[^a-zA-Z0-9&amp;\-']|\d{0,4}s:" />
>        
> <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords_extra_query.txt" enablePositionIncrements="false" />
>        
> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>       
> <filter class="solr.LowerCaseFilterFactory"/>
>       
> </analyzer>
>     
> </fieldType>
> after doing so, fuzzy search do not seems to working properly as it was
> working before.
> 
> I m searching with search term : worde~1
> 
> on search , before it was returning , around 300 records , but now its
> returning only 5 records. not sure what can be issue.
> 
> Can anybody help me to make it work!!
> 
> 
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/fuzzy-search-issue-with-PatternTokenizer-Factory-tp4057275.html
> Sent from the Solr - User mailing list archive at Nabble.com.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/fuzzy-search-issue-with-PatternTokenizer-Factory-tp4057275p4057831.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: fuzzy search issue with PatternTokenizer Factory

Reply via email to