AW: Lexical analysis tools for German language data

Michael Ludwig Thu, 12 Apr 2012 03:18:51 -0700

> Given an input of "Windjacke" (probably "wind jacket" in English),
> I'd like the code that prepares the data for the index (tokenizer
> etc) to understand that this is a "Jacke" ("jacket") so that a
> query for "Jacke" would include the "Windjacke" document in its
> result set.
> 
> It appears to me that such an analysis requires a dictionary-
> backed approach, which doesn't have to be perfect at all; a list
> of the most common 2000 words would probably do the job and fulfil
> a criterion of reasonable usefulness.


A simple approach would obviously be a word list and a regular
expression. There will, however, be nuts and bolts to take care of.
A more sophisticated and tested approach might be known to you.

Michael

AW: Lexical analysis tools for German language data

Reply via email to