Garbage in, garbage out <G>....

Your indexing analysis chain is breaking up the tokens via the
EdgeNgramTokenizer and _putting those values in the index_.
Then the TermsComponent is looking _only_ at the tokens in
the index and giving you back exactly what you're asking for.

So no, there's no way with that analysis chain to get only complete
terms, at that level the fact that a term was part of a larger
input token has been lost. In fact, if you were to enter something
like terms.prefix=1n1 you'd likely see all your 3-grams that start
with 1n1 etc.

So use a copyfield and put these in a separate field that has
only whole tokens or just take the EdgeNgramTokenizer from
your current definition. If the latter, blow away your index and re-index
from scratch.

Best
Erick

On Tue, Jul 16, 2013 at 4:48 AM, Lochschmied, Alexander
<alexander.lochschm...@vishay.com> wrote:
> Hi Eric and everybody else!
>
> Thanks for trying to help. Here is the example:
>
> .../terms?terms.regex.flag=case_insensitive&terms.fl=suggest&terms=true&terms.limit=20&terms.sort=index&terms.prefix=1n1187
>
> returns
>
> <int name="1n1187">1</int>
> <int name="1n1187a">1</int>
> <int name="1n1187r">1</int>
> <int name="1n1187ra">1</int>
>
> This list contains 3 complete part numbers but the third item (1n1187r) is 
> not a complete part number. Is there a way to make terms tell if a term 
> represents a complete value?
> (My guess is that this gets lost after ngram but I'm still hoping something 
> can be done.)
>
> More config details:
>
> <field name="suggest" type="text_parts" indexed="true" stored="true" 
> required="false" multiValued="true"/>
>
> and
>
> <fieldType name="text_parts" class="solr.TextField" 
> positionIncrementGap="100">
>         <analyzer type="index">
>                 <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>                 <filter class="solr.StopFilterFactory" ignoreCase="true" 
> words="stopwords.txt" enablePositionIncrements="true"/>
>                 <filter class="solr.LowerCaseFilterFactory"/>
>                 <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" 
> maxGramSize="20" side="front"/>
>         </analyzer>
>         <analyzer type="query">
>                 <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>                 <filter class="solr.StopFilterFactory" ignoreCase="true" 
> words="stopwords.txt" enablePositionIncrements="true"/>
>                 <filter class="solr.LowerCaseFilterFactory"/>
>         </analyzer>
> </fieldType>
>
> Thanks,
> Alexander
>
>
> -----Ursprüngliche Nachricht-----
> Von: Erick Erickson [mailto:erickerick...@gmail.com]
> Gesendet: Samstag, 13. Juli 2013 19:58
> An: solr-user@lucene.apache.org
> Betreff: Re: About Suggestions
>
> Not quite sure what you mean here, a couple of examples would help.
>
> But since the term is using keyword tokenizer, then each thing you get back 
> is a complete term, by definition. So I'm not quite sure what you're asking 
> here.....
>
> Best
> Erick
>
> On Fri, Jul 12, 2013 at 4:48 AM, Lochschmied, Alexander 
> <alexander.lochschm...@vishay.com> wrote:
>> Hi Solr people!
>>
>> We need to suggest part numbers in alphabetically order adding up to four 
>> characters to the already entered part number prefix. That works quite well 
>> with terms component acting on a multivalued field with keyword tokenizer 
>> and edge nGram filter. I am mentioning "part numbers" to indicate that each 
>> item in the multivalued field is a string without whitespace and where 
>> special characters like dashes cannot be seen as separators.
>>
>> Is there a way to know if the term (the suggestion) represents such a 
>> complete part number (without doing another query for each suggestion)?
>>
>> Since we are using SolJ, what we would need is something like
>>         boolean Term.isRepresentingCompleteFieldValue()
>>
>> Thanks,
>> Alexander

Reply via email to