Hello Andrea, I think you face a rather common issue involving keyword tokenization and query parsing in Lucene: The query parser splits the input query on white spaces, and then each token is analysed according to your configuration. So those queries with a whitespace won't behave as expected because each token is analysed separately. Consequently, the catenated version of the reference cannot be generated. I think you could try surrounding your query with double quotes or escaping the space characters in your query using a backslash so that the whole sequence is analysed in the same analyser and the catenation occurs. You should be aware that this approach has a drawback: you will probably not be able to combine the search for Mag. 778 G 69 with other words in other fields unless you are able to identify which spaces are to be escaped: For example, if input the query is: Awesome Mag. 778 G 69 you would want to transform it to: Awesome Mag.\ 778\ G\ 69 // spaces are escaped in the reference only or Awesome "Mag. 778 G 69" // only the reference is turned into a phrase query
Do you get the point? Look at the differences between what you tried and the following examples which should all do what you want: http://localhost:8983/solr/collection1/select?q=%22Mag.%20778%20G%2069%22&debugQuery=on&qf=text%20myfield&defType=dismax OR http://localhost:8983/solr/collection1/select?q=myfield:Mag.\%20778\%20G\%2069&debugQuery=on OR http://localhost:8983/solr/collection1/select?q=Mag.\%20778\%20G\%2069&debugQuery=on&qf=text%20myfield&defType=edismax I hope this helps Tanguy On Aug 12, 2013, at 11:13 AM, Andrea Gazzarini <andrea.gazzar...@gmail.com> wrote: > Hi all, > I have a field (among others)in my schema defined like this: > > <fieldtype name="mytype" class="solr.TextField" positionIncrementGap="100"> > <analyzer> > <tokenizer class="solr.*KeywordTokenizerFactory*" /> > <filter class="solr.LowerCaseFilterFactory" /> > <filter class="solr.WordDelimiterFilterFactory" > generateWordParts="0" > generateNumberParts="0" > catenateWords="0" > catenateNumbers="0" > catenateAll="1" > splitOnCaseChange="0" /> > </analyzer> > </fieldtype> > > <field name="myfield" type="mytype" indexed="true"/> > > Basically, both at index and query time the field value is normalized like > this. > > Mag. 778 G 69 => mag778g69 > > Now, in my solrconfig I'm using a search handler like this: > > <requestHandler ....> > ... > <str name="defType">dismax</str> > ... > <str name="mm">100%</str> > <str name="qf">myfield^3000</str> > <str name="pf">myfield^30000</str> > > </requestHandler> > > What I'm expecting is that if I index a document with a value for my field > "Mag. 778 G 69", I will be able to get this document by querying > > 1. Mag. 778 G 69 > 2. mag 778 g69 > 3. mag778g69 > > But that doesn't wotk: i'm able to get the document only and if only I use > the "normalized2 form: mag778g69 > > After doing a little bit of debug, I see that, even I used a KeywordTokenizer > in my field type declaration, SOLR is doing soemthign like this: > / > // +((DisjunctionMaxQuery((//myfield://*mag*//^3000.0)~0.1) > DisjunctionMaxQuery((//myfield://*778*//^3000.0)~0.1) > DisjunctionMaxQuery((//myfield://*g*//^3000.0)~0.1) > DisjunctionMaxQuery((//myfield://*69*//^3000.0)~0.1))~4) > DisjunctionMaxQuery((//myfield://*mag778g69*//^30000.0)~0.1)/ > > That is, it is tokenizing the original query string (mag + 778 + g + 69) and > obviously querying the field for separate tokens doesn't match anything (at > least this is what I think) > > Does anybody could please explain me that? > > Thanks in advance > Andrea