That was a complicated answer, but ultimately the right one. Thank you very much.
2014-01-30 Jack Krupansky <j...@basetechnology.com>: > The word delimiter filter will turn 26KA into two tokens, as if you had > written "26 KA" without the quotes. The autoGeneratePhraseQueries option > will cause the multiple terms to be treated as if they actually were > enclosed within quotes, otherwise they will be treated as separate and > unquoted terms. If you do enclose "26KA" in quotes in your query then > autoGeneratePhraseQueries is not relevant. > > Ah... maybe the problem is that you have preserveOriginal="true" in your > query analyzer. Do you have your default query operator set to "AND"? If > so, it would treat "26KA" as "26" AND "KA" AND "26KA", which requires that > "26KA" (without the trailing dot) to be in the index. > > It seems counter-intuitive, but the attributes of the index and query word > delimiter filters need to be slightly asymmetric. > > > -- Jack Krupansky > > -----Original Message----- From: Thomas Michael Engelke > Sent: Thursday, January 30, 2014 2:16 AM > > To: solr-user@lucene.apache.org > Subject: Re: Not finding part of fulltext field when word ends in dot > > I'm not sure I got my problem across. If I understand the snippet of > documentation right, autoGeneratePhraseQueries only affects queries that > result in multiple tokens, which mine does not. The version also is > 3.6.0.1, and we're not planning on upgrading to any 4.x version. > > > 2014-01-29 Jack Krupansky <j...@basetechnology.com> > > You might want to add autoGeneratePhraseQueries="true" to your field >> type, but I don't think that would cause a break when going from 3.6 to >> 4.x. The default for that attribute changed in Solr 3.5. What release was >> your data indexed using? There may have been some subtle word delimiter >> filter changes between 3.x and 4.x. >> >> Read: >> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201202.mbox/% >> 3CC0551C512C863540BC59694A118452AA0764A434@ITS-EMBX-03. >> adsroot.itcs.umich.edu%3E >> >> >> >> -----Original Message----- From: Thomas Michael Engelke >> Sent: Wednesday, January 29, 2014 11:16 AM >> To: solr-user@lucene.apache.org >> Subject: Re: Not finding part of fulltext field when word ends in dot >> >> >> The fieldType definition is a tad on the longer side: >> >> <fieldType name="text" class="solr.TextField" >> positionIncrementGap="100"> >> <analyzer type="index"> >> <tokenizer >> class="solr.WhitespaceTokenizerFactory"/> >> >> <filter >> class="solr.WordDelimiterFilterFactory" >> catenateWords="1" >> catenateNumbers="1" >> generateNumberParts="1" >> splitOnCaseChange="1" >> generateWordParts="1" >> catenateAll="0" >> preserveOriginal="1" >> splitOnNumerics="0" >> /> >> >> <filter >> class="solr.LowerCaseFilterFactory"/> >> <filter class="solr.SynonymFilterFactory" >> synonyms="german/synonyms.txt" ignoreCase="true" expand="true"/> >> <filter >> class="solr.DictionaryCompoundWordTokenFilterFactory" >> >> dictionary="german/german-common-nouns.txt" >> minWordSize="5" >> minSubwordSize="4" >> maxSubwordSize="15" >> onlyLongestMatch="true" >> /> >> >> <filter class="solr.StopFilterFactory" >> words="german/stopwords.txt" ignoreCase="true" >> enablePositionIncrements="true"/> >> <filter >> class="solr.SnowballPorterFilterFactory" language="German2" >> protected="german/protwords.txt"/> >> <filter >> class="solr.RemoveDuplicatesTokenFilterFactory"/> >> </analyzer> >> <analyzer type="query"> >> <tokenizer >> class="solr.WhitespaceTokenizerFactory"/> >> >> <filter >> class="solr.WordDelimiterFilterFactory" >> catenateWords="0" >> catenateNumbers="0" >> generateWordParts="1" >> splitOnCaseChange="1" >> generateNumberParts="1" >> catenateAll="0" >> preserveOriginal="1" >> splitOnNumerics="0" >> /> >> <filter >> class="solr.LowerCaseFilterFactory"/> >> <filter class="solr.StopFilterFactory" >> words="german/stopwords.txt" ignoreCase="true" >> enablePositionIncrements="true"/> >> <filter >> class="solr.SnowballPorterFilterFactory" language="German2" >> protected="german/protwords.txt"/> >> <filter >> class="solr.RemoveDuplicatesTokenFilterFactory"/> >> </analyzer> >> </fieldType> >> >> >> Thank you for taking a look. >> >> >> 2014-01-29 Jack Krupansky <j...@basetechnology.com> >> >> What field type and analyzer/tokenizer are you using? >> >>> >>> -- Jack Krupansky >>> >>> -----Original Message----- From: Thomas Michael Engelke Sent: Wednesday, >>> January 29, 2014 10:45 AM To: solr-user@lucene.apache.org Subject: Not >>> finding part of fulltext field when word ends in dot >>> Hello everybody, >>> >>> we have a legacy solr installation in version 3.6.0.1. One of the indices >>> defines a field named "content" as a fulltext field where a product >>> description will reside. One of the records indexed contains the >>> following >>> data (excerpt): >>> >>> z. B. in der Serie 26KA. >>> >>> I had the problem that searching the value "26KA" didn't find anything. >>> Using the analyzer of the adminstrative interface and using the full text >>> on one hand and "26KA" as the query string, I can see how the search >>> string >>> is transformed by the used filter factories. The >>> WordDelimiterFilterFactory >>> transforms the "26KA." into "26KA", which is displayed like this >>> (excerpt): >>> >>> 73 74 75 76 >>> in der Serie 26KA. >>> 26KA >>> >>> It seems that it stripped the "26KA." of the dot. Using the option to >>> highlight matches, an analysis search of "26KA" shows the lower of the >>> two >>> entries matches (after reaching the LowerCaseFilterFactory). However, >>> querying the index using the query interface doesn't show any matches. >>> >>> I discovered that adding an asterisk to the search seems to work, as does >>> adding the dot. I am puzzled by this, as I thought that the second added >>> entry was the word actually indexed. I've tried looking up the definition >>> of the administrative interface, but the documentation only specifies >>> this >>> for the latest version, where the display is different and (at least in >>> the >>> sample) doesn't show such "duplication". >>> >>> Can anybody shed some light onto this? >>> >>> >>> >> >