We are trying to get edismax to handle collocations mapped to a single token. To do so we need to manipulate the "chunks" (as Hoss referred to them in http://www.lucidimagination.com/blog/2010/05/23/whats-a-dismax/) generated by the dismax parser. We have numerous collocations (terms of speech which do not directly relate to the constituent words that make up the saying). For example, at index time "real estate" is mapped to "real_estate" to avoid it colliding with searches for "estate" or "real value". So we need the "chunks" to reflect this mapping of multi-word phrases to a single token that is done during indexing (via the synonym filter).

In an ideal world, we would just list the queryAnalyzerFieldType that should be used in pre-processing the query string before it is divided into "chunks" (similar to what is done with the SpellChecker Compoenent).

But our impression thus far is that we are off the reservation and will need to hack away at org.apache.solr.search.ExtendedDismaxQParser.splitIntoClauses(String, boolean).

    Is it correct that the only pre-processing by dismax is on stopwords?

Is it correct to be able to limit customization to splitIntoClauses(String, boolean) to handle this?

Regards,

Christopher





Reply via email to