Hi,

This is how we implement our autocomplete feature, excerpt from
schema.xml

-First accept the input as is without alteration
-Lowercase the input, and eliminate all non a-z0-9 chars to normalize
the input
-split into multiple tokens with EdgeNGramFilterFactory upto a max of
100 chars, all starting from the beginning of the input, e.g. hello
becomes h,he,hel,hell,hello. 
-For queries we accept the first 20 chars.

Hope this helps.


<fieldType name="autocomplete" class="solr.TextField">
        <analyzer type="index">
            <tokenizer class="solr.KeywordTokenizerFactory"/>
            <filter class="solr.LowerCaseFilterFactory" />
            <filter class="solr.PatternReplaceFilterFactory"
pattern="([^a-z0-9])" replacement="" replace="all" />
            <filter class="solr.EdgeNGramFilterFactory"
maxGramSize="100" minGramSize="1" />
        </analyzer>
        <analyzer type="query">
            <tokenizer class="solr.KeywordTokenizerFactory"/>
            <filter class="solr.LowerCaseFilterFactory" />
            <filter class="solr.PatternReplaceFilterFactory"
pattern="([^a-z0-9])" replacement="" replace="all" />
            <filter class="solr.PatternReplaceFilterFactory"
pattern="^(.{20})(.*)?" replacement="$1" replace="all" />
        </analyzer>
</fieldType>
...
<field name="ac" type="autocomplete" indexed="true" stored="true"
required="false" />

Regards,
Dan




On Mon, 2008-07-07 at 17:12 +0000, sundar shankar wrote:
> Hi All,
>            I am using Solr for some time and am having trouble with an auto 
> complete feature that I have been trying to incorporate. I am indexing solr 
> as a database column to solr field mapping. I have tried various configs that 
> were mentioned in the solr user community suggestions and have tried a few 
> option of my own too. Each of them seem to either not bring me the exact data 
> I want or seems to get excess data.
> 
> I have tried.
> text_ws,
> text,
> string
> EdgeNGramTokenizerFactory
> the subword example
> textTight
> and juggling arnd some of the filters and analysers togther.
> 
> Couldnt get dismax to work as somehow it wasnt able to connect my field 
> defined in the schema to the qf param that I was passing in the request.
> 
> Text tight was the best results I had but the problem there was it was 
> searching for whole words and not part words.
> example
> 
> if my query String was field1:Word1 word2* I was getting back results but if 
> my query string was field1: Word1 wor* I didnt get a result back.
> 
> I am little perplexed on how to implement this. I dont know what has to be 
> done.
> 
> The schema
> 
> 
>    <field name="institution.name" type="text_ws" indexed="true" stored="true" 
> termVectors="true"/>
>    <!--Sundar changed city to subword so that spaces are ignored-->
> 
>    <field name="instAlphaSort" type="alphaOnlySort" indexed="true" 
> stored="false" multiValued="true"/>
>    <!-- Tight text cos we want results to be much the same for this-->
>    <field name="instText" type="text" indexed="true" stored="true"  
> termVectors="true" multiValued="true"/>
>    <field name="instString" type="autosuggest" indexed="true" stored="true"  
> termVectors="true" multiValued="true"/>
> 
>    <field name="instSubword" type="subword" indexed="true" stored="true" 
> multiValued="true"  termVectors="true"/>
>    <field name="instTight" type="textTight" indexed="true" stored="true" 
> multiValued="true"  termVectors="true"/>
> 
> 
> 
> I Index institution.name only, the rest are copy fields of the same.
> 
> 
> Any help is appreciated.
> 
> Thanks
> Sundar
> 
> _________________________________________________________________
> Chose your Life Partner? Join MSN Matrimony
> http://www.shaadi.com/msn/matrimony.php 
> 
> <<This email has been scanned for virus and spam content>>
Daniel Rosher
Developer
www.thehotonlinenetwork.com
d: 0207 3489 912

    t: 0845 4680 568

    f: 0845 4680 868

    m: 

                Beaumont House, Kensington Village, Avonmore Road, London, W14 
8TS
        


    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
- - - - - - - - - - - - - - - - -

    This message is sent in confidence for the addressee only. It may contain 
privileged

    information. The contents are not to be disclosed to anyone other than the 
addressee.

    Unauthorised recipients are requested to preserve this confidentiality and 
to advise

    us of any errors in transmission. Thank you.

    hotonline ltd is registered in England & Wales. Registered office: One 
Canada Square,

    Canary Wharf, London E14 5AP. Registered No: 1904765.

Reply via email to