Nikolas, thanks a lot for that, I've just gave it a quick test and it definitely seems to work for the examples I've gave.
Thanks again, Scott From: Nikolas Tautenhahn [via Lucene] Sent: Monday, August 23, 2010 3:14 PM To: Scottie Subject: Re: Tokenising on Each Letter Hi Scottie, > Could you elaborate about N gram for me, based on my schema? just a quick reply: > <fieldType name="textNGram" class="solr.TextField" > positionIncrementGap="100"> > <analyzer type="index"> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <!-- in this example, we will only use synonyms at query time > <filter class="solr.SynonymFilterFactory" > synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/> --> > > <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" > generateNumberParts="0" catenateWords="1" catenateNumbers="0" catenateAll="0" > splitOnCaseChange="1" splitOnNumerics="0" preserveOriginal="1"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.EdgeNGramFilterFactory" side="front" minGramSize="2" > maxGramSize="30" /> > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" > ignoreCase="true" expand="true"/> > <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" > generateNumberParts="0" catenateWords="0" catenateNumbers="0" catenateAll="0" > splitOnCaseChange="1" splitOnNumerics="0" preserveOriginal="1"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > </analyzer> > </fieldType> Will produce any NGrams from 2 up to 30 Characters, for Info check http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.EdgeNGramFilterFactory Be sure to adjust those sizes (minGramSize/maxGramSize) so that maxGramSize is big enough to keep the whole original serial number/model number and minGramSize is not so small that you fill your index with useless information. Best regards, Nikolas Tautenhahn -------------------------------------------------------------------------------- View message @ http://lucene.472066.n3.nabble.com/Tokenising-on-Each-Letter-tp1247113p1292238.html To unsubscribe from Tokenising on Each Letter, click here. -- View this message in context: http://lucene.472066.n3.nabble.com/Tokenising-on-Each-Letter-tp1247113p1294586.html Sent from the Solr - User mailing list archive at Nabble.com.