Re: About Suggestions

Erick Erickson Tue, 16 Jul 2013 17:52:12 -0700

Maybe it was lost, I tent to babble on... But use a copyField directive
that doesn't have the EdgeNGramTokenizerFactory in the chain
and get your suggestions from _that_ field rather than the one you
do use currently. You can still search & etc. on the one you now
have, just get your suggestions from the copied field.


Best
Erick

On Tue, Jul 16, 2013 at 8:39 AM, Lochschmied, Alexander
<alexander.lochschm...@vishay.com> wrote:
> Thanks Eric, that is what I suspected. We are very happy with the four 
> suggestions in the example (and all the others), but we would like to know 
> which of them represents a full part number.
> Can you elaborate a little more how that could be achieved?
>
> Best regards,
> Alexander
>
> -----Ursprüngliche Nachricht-----
> Von: Erick Erickson [mailto:erickerick...@gmail.com]
> Gesendet: Dienstag, 16. Juli 2013 14:09
> An: solr-user@lucene.apache.org
> Betreff: Re: About Suggestions
>
> Garbage in, garbage out <G>....
>
> Your indexing analysis chain is breaking up the tokens via the 
> EdgeNgramTokenizer and _putting those values in the index_.
> Then the TermsComponent is looking _only_ at the tokens in the index and 
> giving you back exactly what you're asking for.
>
> So no, there's no way with that analysis chain to get only complete terms, at 
> that level the fact that a term was part of a larger input token has been 
> lost. In fact, if you were to enter something like terms.prefix=1n1 you'd 
> likely see all your 3-grams that start with 1n1 etc.
>
> So use a copyfield and put these in a separate field that has only whole 
> tokens or just take the EdgeNgramTokenizer from your current definition. If 
> the latter, blow away your index and re-index from scratch.
>
> Best
> Erick
>
> On Tue, Jul 16, 2013 at 4:48 AM, Lochschmied, Alexander 
> <alexander.lochschm...@vishay.com> wrote:
>> Hi Eric and everybody else!
>>
>> Thanks for trying to help. Here is the example:
>>
>> .../terms?terms.regex.flag=case_insensitive&terms.fl=suggest&terms=tru
>> e&terms.limit=20&terms.sort=index&terms.prefix=1n1187
>>
>> returns
>>
>> <int name="1n1187">1</int>
>> <int name="1n1187a">1</int>
>> <int name="1n1187r">1</int>
>> <int name="1n1187ra">1</int>
>>
>> This list contains 3 complete part numbers but the third item (1n1187r) is 
>> not a complete part number. Is there a way to make terms tell if a term 
>> represents a complete value?
>> (My guess is that this gets lost after ngram but I'm still hoping
>> something can be done.)
>>
>> More config details:
>>
>> <field name="suggest" type="text_parts" indexed="true" stored="true"
>> required="false" multiValued="true"/>
>>
>> and
>>
>> <fieldType name="text_parts" class="solr.TextField" 
>> positionIncrementGap="100">
>>         <analyzer type="index">
>>                 <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>                 <filter class="solr.StopFilterFactory" ignoreCase="true" 
>> words="stopwords.txt" enablePositionIncrements="true"/>
>>                 <filter class="solr.LowerCaseFilterFactory"/>
>>                 <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" 
>> maxGramSize="20" side="front"/>
>>         </analyzer>
>>         <analyzer type="query">
>>                 <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>                 <filter class="solr.StopFilterFactory" ignoreCase="true" 
>> words="stopwords.txt" enablePositionIncrements="true"/>
>>                 <filter class="solr.LowerCaseFilterFactory"/>
>>         </analyzer>
>> </fieldType>
>>
>> Thanks,
>> Alexander
>>
>>
>> -----Ursprüngliche Nachricht-----
>> Von: Erick Erickson [mailto:erickerick...@gmail.com]
>> Gesendet: Samstag, 13. Juli 2013 19:58
>> An: solr-user@lucene.apache.org
>> Betreff: Re: About Suggestions
>>
>> Not quite sure what you mean here, a couple of examples would help.
>>
>> But since the term is using keyword tokenizer, then each thing you get back 
>> is a complete term, by definition. So I'm not quite sure what you're asking 
>> here.....
>>
>> Best
>> Erick
>>
>> On Fri, Jul 12, 2013 at 4:48 AM, Lochschmied, Alexander 
>> <alexander.lochschm...@vishay.com> wrote:
>>> Hi Solr people!
>>>
>>> We need to suggest part numbers in alphabetically order adding up to four 
>>> characters to the already entered part number prefix. That works quite well 
>>> with terms component acting on a multivalued field with keyword tokenizer 
>>> and edge nGram filter. I am mentioning "part numbers" to indicate that each 
>>> item in the multivalued field is a string without whitespace and where 
>>> special characters like dashes cannot be seen as separators.
>>>
>>> Is there a way to know if the term (the suggestion) represents such a 
>>> complete part number (without doing another query for each suggestion)?
>>>
>>> Since we are using SolJ, what we would need is something like
>>>         boolean Term.isRepresentingCompleteFieldValue()
>>>
>>> Thanks,
>>> Alexander

Re: About Suggestions

Reply via email to