Lexical analysis tools for German language data

Michael Ludwig Thu, 12 Apr 2012 02:53:24 -0700

Given an input of "Windjacke" (probably "wind jacket" in English), I'd
like the code that prepares the data for the index (tokenizer etc) to
understand that this is a "Jacke" ("jacket") so that a query for "Jacke"
would include the "Windjacke" document in its result set.


It appears to me that such an analysis requires a dictionary-backed
approach, which doesn't have to be perfect at all; a list of the most
common 2000 words would probably do the job and fulfil a criterion of
reasonable usefulness.

Do you know of any implementation techniques or working implementations
to do this kind of lexical analysis for German language data? (Or other
languages, for that matter?) What are they, where can I find them?

I'm sure there is something out (commercial or free) because I've seen
lots of engines grokking German and the way it builds words.

Failing that, what are the proper terms do refer to these techniques so
you can search more successfully?

Michael

Lexical analysis tools for German language data

Reply via email to