Re: Flexible search field analyser/tokenizer configuration

Alexandre Rafalovitch Sun, 28 Sep 2014 20:06:07 -0700

WordDelimiterFilterFactory is rather a specialized and capricious
beast. Possibly not the most suitable for your needs (it's for things
like "iPhone 6" == "iphone6").


Things you may want to look at:
http://www.solr-start.com/javadoc/solr-lucene/org/apache/lucene/analysis/ngram/EdgeNGramFilterFactory.html
http://www.solr-start.com/javadoc/solr-lucene/org/apache/lucene/analysis/core/StopFilterFactory.html
http://www.solr-start.com/javadoc/solr-lucene/org/apache/lucene/analysis/commongrams/CommonGramsFilterFactory.html
http://www.solr-start.com/javadoc/solr-lucene/org/apache/lucene/analysis/commongrams/CommonGramsQueryFilter.html

Regards,
    Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 28 September 2014 22:50, PeterKerk <petervdk...@hotmail.com> wrote:
> I have a site which lists companies.
>
> I'm looking to improve my search, but I want to know which available
> analysers and tokenizers I should use for which scenario, and if it's at all
> possible.
>
> I want users to be able to search on the company title on for example a
> company called "The Royal Garden"
>
> The logic for this search should be as follows, "The Royal Garden", should
> be found on queries:
> "the royal garden"
> "royal garden"
> "the roy"
> "The royal"
> "RoYAl"
> "garden"
>
> So case insensitive, matching on parts of words.
>
> However, a query "the royal" should not return companies like:
> "the wall"
> "the room"
> "the restaurant"
>
> So words like "the", but also "a" should be ignored if these are the only
> match in the searchquery.
>
> I now have this:
>
>     <fieldType name="searchtext" class="solr.TextField"
> positionIncrementGap="100">
>       <analyzer type="index">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>       </analyzer>
>     </fieldType>
>
>
>         <field name="title_search" type="searchtext" indexed="true" 
> stored="true"/>
>
> I'm testing on http://localhost:8983/solr/#/bm/analysis but I'm stuck.
>
> Also, I would think my scenario is pretty common and lots of users have
> already configured their Solr search to be flexible and powerful...any good
> search configurations would be welcome!
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Flexible-search-field-analyser-tokenizer-configuration-tp4161624.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Flexible search field analyser/tokenizer configuration

Reply via email to