Maybe it was lost, I tent to babble on... But use a copyField directive that doesn't have the EdgeNGramTokenizerFactory in the chain and get your suggestions from _that_ field rather than the one you do use currently. You can still search & etc. on the one you now have, just get your suggestions from the copied field.
Best Erick On Tue, Jul 16, 2013 at 8:39 AM, Lochschmied, Alexander <alexander.lochschm...@vishay.com> wrote: > Thanks Eric, that is what I suspected. We are very happy with the four > suggestions in the example (and all the others), but we would like to know > which of them represents a full part number. > Can you elaborate a little more how that could be achieved? > > Best regards, > Alexander > > -----Ursprüngliche Nachricht----- > Von: Erick Erickson [mailto:erickerick...@gmail.com] > Gesendet: Dienstag, 16. Juli 2013 14:09 > An: solr-user@lucene.apache.org > Betreff: Re: About Suggestions > > Garbage in, garbage out <G>.... > > Your indexing analysis chain is breaking up the tokens via the > EdgeNgramTokenizer and _putting those values in the index_. > Then the TermsComponent is looking _only_ at the tokens in the index and > giving you back exactly what you're asking for. > > So no, there's no way with that analysis chain to get only complete terms, at > that level the fact that a term was part of a larger input token has been > lost. In fact, if you were to enter something like terms.prefix=1n1 you'd > likely see all your 3-grams that start with 1n1 etc. > > So use a copyfield and put these in a separate field that has only whole > tokens or just take the EdgeNgramTokenizer from your current definition. If > the latter, blow away your index and re-index from scratch. > > Best > Erick > > On Tue, Jul 16, 2013 at 4:48 AM, Lochschmied, Alexander > <alexander.lochschm...@vishay.com> wrote: >> Hi Eric and everybody else! >> >> Thanks for trying to help. Here is the example: >> >> .../terms?terms.regex.flag=case_insensitive&terms.fl=suggest&terms=tru >> e&terms.limit=20&terms.sort=index&terms.prefix=1n1187 >> >> returns >> >> <int name="1n1187">1</int> >> <int name="1n1187a">1</int> >> <int name="1n1187r">1</int> >> <int name="1n1187ra">1</int> >> >> This list contains 3 complete part numbers but the third item (1n1187r) is >> not a complete part number. Is there a way to make terms tell if a term >> represents a complete value? >> (My guess is that this gets lost after ngram but I'm still hoping >> something can be done.) >> >> More config details: >> >> <field name="suggest" type="text_parts" indexed="true" stored="true" >> required="false" multiValued="true"/> >> >> and >> >> <fieldType name="text_parts" class="solr.TextField" >> positionIncrementGap="100"> >> <analyzer type="index"> >> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >> <filter class="solr.StopFilterFactory" ignoreCase="true" >> words="stopwords.txt" enablePositionIncrements="true"/> >> <filter class="solr.LowerCaseFilterFactory"/> >> <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" >> maxGramSize="20" side="front"/> >> </analyzer> >> <analyzer type="query"> >> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >> <filter class="solr.StopFilterFactory" ignoreCase="true" >> words="stopwords.txt" enablePositionIncrements="true"/> >> <filter class="solr.LowerCaseFilterFactory"/> >> </analyzer> >> </fieldType> >> >> Thanks, >> Alexander >> >> >> -----Ursprüngliche Nachricht----- >> Von: Erick Erickson [mailto:erickerick...@gmail.com] >> Gesendet: Samstag, 13. Juli 2013 19:58 >> An: solr-user@lucene.apache.org >> Betreff: Re: About Suggestions >> >> Not quite sure what you mean here, a couple of examples would help. >> >> But since the term is using keyword tokenizer, then each thing you get back >> is a complete term, by definition. So I'm not quite sure what you're asking >> here..... >> >> Best >> Erick >> >> On Fri, Jul 12, 2013 at 4:48 AM, Lochschmied, Alexander >> <alexander.lochschm...@vishay.com> wrote: >>> Hi Solr people! >>> >>> We need to suggest part numbers in alphabetically order adding up to four >>> characters to the already entered part number prefix. That works quite well >>> with terms component acting on a multivalued field with keyword tokenizer >>> and edge nGram filter. I am mentioning "part numbers" to indicate that each >>> item in the multivalued field is a string without whitespace and where >>> special characters like dashes cannot be seen as separators. >>> >>> Is there a way to know if the term (the suggestion) represents such a >>> complete part number (without doing another query for each suggestion)? >>> >>> Since we are using SolJ, what we would need is something like >>> boolean Term.isRepresentingCompleteFieldValue() >>> >>> Thanks, >>> Alexander