Re: solr and approximate string matching

Shalin Shekhar Mangar Sun, 30 Aug 2009 12:32:37 -0700

On Fri, Aug 21, 2009 at 12:31 AM, Ryszard Szopa <ryszard.sz...@gmail.com>wrote:


>
> So, we have a database of movies and series, and as the data comes
> from many sources of varying reliability, we'd like to be able to do
> fuzzy string matching on the titles of episodes (the default matching
> mechanisms operate on word levels, which is not good enough for short
> strings, like titles). I had used n-grams approximate matching in the
> past, and I was very happy to find that Lucene (and Solr) supports
> something like this out of the box.
>
> I assumed that I need a special field type for this, so I added the
> following field-type to my schema.xml:
>
>   <fieldType
>       name="trigrams"
>       stored="true"
>       class="solr.StrField">
>     <analyzer type="index">
>       <tokenizer
>           class="solr.analysis.NGramTokenizerFactory"
>           minGramSize="3"
>           maxGramSize="5"
>           />
>       <filter class="solr.LowerCaseFilterFactory"/>
>     </analyzer>
>   </fieldType>
>
> and changed the appropriate field in the schema to:
>
> <field name="title" type="trigrams" indexed="true" stored="true"
> multiValued="false" />
>
> However, this is not working as I expected. The query analysis looks
> correctly, but I don't get any results, which makes me believe that
> something happens at index time (ie. the title is indexed like a
> default string field instead of trigram field).
>

The best way to debug these kind of problems is to look at analysis.jsp
and/or use debugQuery=on on the query to see exactly how it is being parsed.

Can you post the output of your query with debugQuery=on?

-- 
Regards,
Shalin Shekhar Mangar.

Re: solr and approximate string matching

Reply via email to