Hi Alistair,

it seems that there are many ways to skin the cat so I describe the approach I used with SOLR 3.6 :-)

* Using a patched DictionaryCompoundWordTokenFilterFactory in the "index" phase - so the german compound noun "Leinenhose" (linen trousers) would be indexed in addition to "Leinen" & "Hose". Afterwards the three tokens go trough stemming.

* One hint which might be useful - I only split words which I consider proper german compound nouns. E.g. if your indexed text contains the token "schwarzkleid" I would NOT split it since it is NOT a proper noun - the proper noun would be "Schwarzkleid" - please note that even "Schwarzkleid" is not a proper german noun anyway :-)

* I use a custom dictionary for splitting consisting of 7.000 entries which contains a lot of customer-specific entries

I do not tinker with DictionaryCompoundWordTokenFilterFactory in the "query" phase of the field so the following queries would work with the indexed word "Leinenhose"

* "leinenhosen"
* "leinenhose"
* "leinen hose"
* "leinen hosen"

Cheers,

Siegfried Goeschl



On 22.04.14 12:13, Alistair wrote:
I've managed to solve this (in a quite hacky sort of way) by using filter
queries and the edismax queryparser.

I added in my solrconfig.xml the following parameters:

     <str name="defType">edismax</str>
     <str name="mm">75%</str>

Then when searching for multiple keywords (for example: schwarzkleid wenz,
where wenz is a german brand name), I use the first keyword as a query and
anything after that I add as a filterquery. So my final query looks
something like this:


fl=id&sort=popular+desc&indent=on&q=keywords:'schwarzkleide'+&wt=json&fq={!edismax}+keywords:'wenz'&fq=deleted:0

My compound splitter filter splits schwarzkleide correctly and it is parsed
as edismax with mm=75%, then the filterqueries are added, for keywords they
are also parsed as edismax. The returned result is all the black dresses
from 'Wenz'.

If anybody has a better solution to what I've posted I would be more than
happy to read up on it as I'm quite new to Solr and I think my way is a bit
convoluted to be honest.

Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Having-trouble-with-German-compound-words-in-Solr-4-7-tp4131964p4132478.html
Sent from the Solr - User mailing list archive at Nabble.com.


Reply via email to