Well, a lot of this is working but not all.

Consider the company name Shooters Inc

My ngram field is able to match queries to the name for shoot and hoot and so 
on. This works.

However consider the company name

Location Scotland

If I query scot I get one result back - but it's for a company called Prescott 
Inc

I looked at the analyzer and realised that the NGramTokenizer was generating 
substrings from the start (left) of the *whole phrase*

location scotland

Because my max was set to 15 it was not generating a token for scot

So I figured I would change to a whitespace tokenizer first and then apply the 
ngram as a filter.

This now looks like it is generating scot in the tokens as shown below:
Index Analyzer

org.apache.solr.analysis.WhitespaceTokenizerFactory {}

term position   1       2
term text       location        scotland
term type       word    word
source start,end        0,8     9,17
payload         
org.apache.solr.analysis.NGramFilterFactory {maxGramSize=15, minGramSize=4}

term position   1       2       3       4       5       6       7       8       
9       10      11      12      13      14      15      16      17      18      
19      20      21      22      23      24      25      26      27      28      
29      30
term text       loca    ocat    cati    atio    tion    locat   ocati   catio   
ation   locati  ocatio  cation  locatio ocation location        scot    cotl    
otla    tlan    land    scotl   cotla   otlan   tland   scotla  cotlan  otland  
scotlan cotland scotland
term type       word    word    word    word    word    word    word    word    
word    word    word    word    word    word    word    word    word    word    
word    word    word    word    word    word    word    word    word    word    
word    word
source start,end        0,4     1,5     2,6     3,7     4,8     0,5     1,6     
2,7     3,8     0,6     1,7     2,8     0,7     1,8     0,8     9,13    10,14   
11,15   12,16   13,17   9,14    10,15   11,16   12,17   9,15    10,16   11,17   
9,16    10,17   9,17
payload                                                                         
                                                                                
                                                                                
Query Analyzer

scot
scot

BUT it still results no results for scot, but does continue to return the 
Prescott result.

So ngramming is working but it is not working when the query is something far 
to the right of the indexed value. 

Is this another user-error or have I missed something else here?

Cheers


On Oct 8, 2010, at 9:02 AM, Allistair Crossley wrote:

> Oh my. I am basically being a total monkey. Every time I was changing my 
> schema.xml to try new things out I was then reindexing our staging server's 
> index instead of my local dev index so no changes were occurring locally.
> 
> Dear me. 
> 
> This is working now, surprise.
> 
> On Oct 8, 2010, at 8:53 AM, Markus Jelsma wrote:
> 
>> How come your query analyser spits out grams? It isn't configured to do so 
>> or 
>> you posted an older field definition. Anyway,  do you actually search on 
>> your 
>> new field?
>> 
>> On Friday, October 08, 2010 02:46:08 pm Allistair Crossley wrote:
>>> Hi,
>>> 
>>> Yep, I was just looking at the analyzer jsp. The ngrams *do* exist as
>>> expected, so it's not my configuration that is at fault (he says)
>>> 
>>> Index Analyzer
>>> sh  ho      oo      ot      te      er      sho     hoo     oot     ote     
>>> ter     shoo    hoot    oote    oter    shoot   
>> hoote        ooter
>>>     shoote  hooter
>>> sh  ho      oo      ot      te      er      sho     hoo     oot     ote     
>>> ter     shoo    hoot    oote    oter    shoot   
>> hoote        oote
>>> r   shoote  hooter
>>> sh  ho      oo      ot      te      er      sho     hoo     oot     ote     
>>> ter     shoo    hoot    oote    oter    shoot   
>> hoote        oote
>>> r   shoote  hooter
>>> sh  ho      oo      ot      te      er      sho     hoo     oot     ote     
>>> ter     shoo    hoot    oote    oter    shoot   
>> hoote        oote
>>> r   shoote  hooter Query Analyzer
>>> 
>>> sh  ho      oo      ot      te      er      sho     hoo     oot     ote     
>>> ter     shoo    hoot    oote    oter    shoot   
>> hoote        ooter
>>>     shoote  hooter
>>> sh  ho      oo      ot      te      er      sho     hoo     oot     ote     
>>> ter     shoo    hoot    oote    oter    shoot   
>> hoote        oote
>>> r   shoote  hooter
>>> sh  ho      oo      ot      te      er      sho     hoo     oot     ote     
>>> ter     shoo    hoot    oote    oter    shoot   
>> hoote        oote
>>> r   shoote  hooter
>>> sh  ho      oo      ot      te      er      sho     hoo     oot     ote     
>>> ter     shoo    hoot    oote    oter    shoot   
>> hoote        oote
>>> r   shoote  hooter
>>> 
>>> 
>>> Yet, searching either
>>> 
>>> /solr/select?q=hoot
>>> 
>>> or
>>> 
>>> /solr/select?q=name:hoot
>>> 
>>> does not yield results.
>>> 
>>> When searching for shooter I see 2 results with names:
>>> 
>>> 1. <str name="name">Shooters International Inc</str>
>>> 2. <str name="name">Hong Kong Shooter</str>
>>> 
>>> Yours, puzzled :)
>>> 
>>> On Oct 8, 2010, at 8:38 AM, Jan Høydahl / Cominvent wrote:
>>>> Hi,
>>>> 
>>>> The first thing I would try is to go to the analysis page, enter your
>>>> test data, and report back what each analysis stage prints out:
>>>> http://localhost:8983/solr/admin/analysis.jsp
>>>> 
>>>> --
>>>> Jan Høydahl, search solution architect
>>>> Cominvent AS - www.cominvent.com
>>>> 
>>>> On 8. okt. 2010, at 14.19, Allistair Crossley wrote:
>>>>> Morning all,
>>>>> 
>>>>> I would like to ngram a company name field in our index. I have read 
>>>>> about 
>> the costs of doing so in the great David Smiley Solr 1.4 book and just to 
>> get 
>> started I have followed his example in setting up an ngram field type as 
>> follows:
>>>>>           <fieldType name="text_substring" class="solr.TextField"
>>>>>           positionIncrementGap="100" stored="false" multiValued="true">
>>>>>           
>>>>>                   <analyzer type="index">
>>>>>                   
>>>>>                           <tokenizer 
>>>>> class="solr.StandardTokenizerFactory" />
>>>>>                           <filter class="solr.LowerCaseFilterFactory" />
>>>>>                           <filter class="solr.NGramFilterFactory" 
>>>>> minGramSize="4"
>>>>>                           maxGramSize="15" />
>>>>>                   
>>>>>                   </analyzer>
>>>>>                   <analyzer type="query">
>>>>>                   
>>>>>                           <tokenizer 
>>>>> class="solr.StandardTokenizerFactory" />
>>>>>                           <filter class="solr.LowerCaseFilterFactory" />
>>>>>                   
>>>>>                   </analyzer>
>>>>>           
>>>>>           </fieldType>
>>>>> 
>>>>> I have restarted/reindexed everything but I still cannot search
>>>>> 
>>>>> hoot
>>>>> 
>>>>> and get back the company named Shooter. searching shooter is fine.
>>>>> 
>>>>> I have followed other examples on the internet regards an ngram field
>>>>> type. Some examples seem to use an index analyzer that has an ngram
>>>>> tokenizer rather than filter if this makes a difference. But in all
>>>>> cases I am not seeing the expected result, just 0 results.
>>>>> 
>>>>> Is there anything else I should be considering here? I feel like I must
>>>>> be very close, it doesn't seem complicated but yet it's not working
>>>>> like everything else I have done with solr to date :)
>>>>> 
>>>>> Any guidance appreciated,
>>>>> 
>>>>> Allistair
>> 
>> -- 
>> Markus Jelsma - CTO - Openindex
>> http://www.linkedin.com/in/markus17
>> 050-8536600 / 06-50258350
> 

Reply via email to