Well, a lot of this is working but not all. Consider the company name Shooters Inc
My ngram field is able to match queries to the name for shoot and hoot and so on. This works. However consider the company name Location Scotland If I query scot I get one result back - but it's for a company called Prescott Inc I looked at the analyzer and realised that the NGramTokenizer was generating substrings from the start (left) of the *whole phrase* location scotland Because my max was set to 15 it was not generating a token for scot So I figured I would change to a whitespace tokenizer first and then apply the ngram as a filter. This now looks like it is generating scot in the tokens as shown below: Index Analyzer org.apache.solr.analysis.WhitespaceTokenizerFactory {} term position 1 2 term text location scotland term type word word source start,end 0,8 9,17 payload org.apache.solr.analysis.NGramFilterFactory {maxGramSize=15, minGramSize=4} term position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 term text loca ocat cati atio tion locat ocati catio ation locati ocatio cation locatio ocation location scot cotl otla tlan land scotl cotla otlan tland scotla cotlan otland scotlan cotland scotland term type word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word source start,end 0,4 1,5 2,6 3,7 4,8 0,5 1,6 2,7 3,8 0,6 1,7 2,8 0,7 1,8 0,8 9,13 10,14 11,15 12,16 13,17 9,14 10,15 11,16 12,17 9,15 10,16 11,17 9,16 10,17 9,17 payload Query Analyzer scot scot BUT it still results no results for scot, but does continue to return the Prescott result. So ngramming is working but it is not working when the query is something far to the right of the indexed value. Is this another user-error or have I missed something else here? Cheers On Oct 8, 2010, at 9:02 AM, Allistair Crossley wrote: > Oh my. I am basically being a total monkey. Every time I was changing my > schema.xml to try new things out I was then reindexing our staging server's > index instead of my local dev index so no changes were occurring locally. > > Dear me. > > This is working now, surprise. > > On Oct 8, 2010, at 8:53 AM, Markus Jelsma wrote: > >> How come your query analyser spits out grams? It isn't configured to do so >> or >> you posted an older field definition. Anyway, do you actually search on >> your >> new field? >> >> On Friday, October 08, 2010 02:46:08 pm Allistair Crossley wrote: >>> Hi, >>> >>> Yep, I was just looking at the analyzer jsp. The ngrams *do* exist as >>> expected, so it's not my configuration that is at fault (he says) >>> >>> Index Analyzer >>> sh ho oo ot te er sho hoo oot ote >>> ter shoo hoot oote oter shoot >> hoote ooter >>> shoote hooter >>> sh ho oo ot te er sho hoo oot ote >>> ter shoo hoot oote oter shoot >> hoote oote >>> r shoote hooter >>> sh ho oo ot te er sho hoo oot ote >>> ter shoo hoot oote oter shoot >> hoote oote >>> r shoote hooter >>> sh ho oo ot te er sho hoo oot ote >>> ter shoo hoot oote oter shoot >> hoote oote >>> r shoote hooter Query Analyzer >>> >>> sh ho oo ot te er sho hoo oot ote >>> ter shoo hoot oote oter shoot >> hoote ooter >>> shoote hooter >>> sh ho oo ot te er sho hoo oot ote >>> ter shoo hoot oote oter shoot >> hoote oote >>> r shoote hooter >>> sh ho oo ot te er sho hoo oot ote >>> ter shoo hoot oote oter shoot >> hoote oote >>> r shoote hooter >>> sh ho oo ot te er sho hoo oot ote >>> ter shoo hoot oote oter shoot >> hoote oote >>> r shoote hooter >>> >>> >>> Yet, searching either >>> >>> /solr/select?q=hoot >>> >>> or >>> >>> /solr/select?q=name:hoot >>> >>> does not yield results. >>> >>> When searching for shooter I see 2 results with names: >>> >>> 1. <str name="name">Shooters International Inc</str> >>> 2. <str name="name">Hong Kong Shooter</str> >>> >>> Yours, puzzled :) >>> >>> On Oct 8, 2010, at 8:38 AM, Jan Høydahl / Cominvent wrote: >>>> Hi, >>>> >>>> The first thing I would try is to go to the analysis page, enter your >>>> test data, and report back what each analysis stage prints out: >>>> http://localhost:8983/solr/admin/analysis.jsp >>>> >>>> -- >>>> Jan Høydahl, search solution architect >>>> Cominvent AS - www.cominvent.com >>>> >>>> On 8. okt. 2010, at 14.19, Allistair Crossley wrote: >>>>> Morning all, >>>>> >>>>> I would like to ngram a company name field in our index. I have read >>>>> about >> the costs of doing so in the great David Smiley Solr 1.4 book and just to >> get >> started I have followed his example in setting up an ngram field type as >> follows: >>>>> <fieldType name="text_substring" class="solr.TextField" >>>>> positionIncrementGap="100" stored="false" multiValued="true"> >>>>> >>>>> <analyzer type="index"> >>>>> >>>>> <tokenizer >>>>> class="solr.StandardTokenizerFactory" /> >>>>> <filter class="solr.LowerCaseFilterFactory" /> >>>>> <filter class="solr.NGramFilterFactory" >>>>> minGramSize="4" >>>>> maxGramSize="15" /> >>>>> >>>>> </analyzer> >>>>> <analyzer type="query"> >>>>> >>>>> <tokenizer >>>>> class="solr.StandardTokenizerFactory" /> >>>>> <filter class="solr.LowerCaseFilterFactory" /> >>>>> >>>>> </analyzer> >>>>> >>>>> </fieldType> >>>>> >>>>> I have restarted/reindexed everything but I still cannot search >>>>> >>>>> hoot >>>>> >>>>> and get back the company named Shooter. searching shooter is fine. >>>>> >>>>> I have followed other examples on the internet regards an ngram field >>>>> type. Some examples seem to use an index analyzer that has an ngram >>>>> tokenizer rather than filter if this makes a difference. But in all >>>>> cases I am not seeing the expected result, just 0 results. >>>>> >>>>> Is there anything else I should be considering here? I feel like I must >>>>> be very close, it doesn't seem complicated but yet it's not working >>>>> like everything else I have done with solr to date :) >>>>> >>>>> Any guidance appreciated, >>>>> >>>>> Allistair >> >> -- >> Markus Jelsma - CTO - Openindex >> http://www.linkedin.com/in/markus17 >> 050-8536600 / 06-50258350 >