On Fri, Aug 21, 2009 at 12:31 AM, Ryszard Szopa <ryszard.sz...@gmail.com>wrote:
> > So, we have a database of movies and series, and as the data comes > from many sources of varying reliability, we'd like to be able to do > fuzzy string matching on the titles of episodes (the default matching > mechanisms operate on word levels, which is not good enough for short > strings, like titles). I had used n-grams approximate matching in the > past, and I was very happy to find that Lucene (and Solr) supports > something like this out of the box. > > I assumed that I need a special field type for this, so I added the > following field-type to my schema.xml: > > <fieldType > name="trigrams" > stored="true" > class="solr.StrField"> > <analyzer type="index"> > <tokenizer > class="solr.analysis.NGramTokenizerFactory" > minGramSize="3" > maxGramSize="5" > /> > <filter class="solr.LowerCaseFilterFactory"/> > </analyzer> > </fieldType> > > and changed the appropriate field in the schema to: > > <field name="title" type="trigrams" indexed="true" stored="true" > multiValued="false" /> > > However, this is not working as I expected. The query analysis looks > correctly, but I don't get any results, which makes me believe that > something happens at index time (ie. the title is indexed like a > default string field instead of trigram field). > The best way to debug these kind of problems is to look at analysis.jsp and/or use debugQuery=on on the query to see exactly how it is being parsed. Can you post the output of your query with debugQuery=on? -- Regards, Shalin Shekhar Mangar.