> Von: Tomas Zerolo
> > > There can be transformations or inflections, like the "s" in
> > > "Weinachtsbaum" (Weinachten/Baum).
> >
> > I remember from my linguistics studies that the terminus technicus
> > for these is "Fugenmorphem" (interstitial or joint morpheme) [...]
>
> IANAL (I am not a l
On Apr 12, 2012, at 9:00 AM, Paul Libbrecht wrote:
> More or less, Fahrrad is generally abbreviated as Rad.
> (even though Rad can mean wheel and bike)
A synonym could handle this, since "farhren" would not be a good match. It is
judgement call, but this seems more like an equivalence "Fahrrad =
On Thursday 12 April 2012 18:00:14 Paul Libbrecht wrote:
> Le 12 avr. 2012 à 17:46, Michael Ludwig a écrit :
> >> Some compounds probably should not be decompounded, like "Fahrrad"
> >> (farhren/Rad). With a dictionary-based stemmer, you might decide to
> >> avoid decompounding for words in the dic
On Apr 12, 2012, at 8:46 AM, Michael Ludwig wrote:
> I remember from my linguistics studies that the terminus technicus for
> these is "Fugenmorphem" (interstitial or joint morpheme).
That is some excellent linguistic jargon. I'll file that with "hapax legomenon".
If you don't highlight, you ca
Le 12 avr. 2012 à 17:46, Michael Ludwig a écrit :
>> Some compounds probably should not be decompounded, like "Fahrrad"
>> (farhren/Rad). With a dictionary-based stemmer, you might decide to
>> avoid decompounding for words in the dictionary.
>
> Good point.
More or less, Fahrrad is generally ab
> Von: Walter Underwood
> German noun decompounding is a little more complicated than it might
> seem.
>
> There can be transformations or inflections, like the "s" in
> "Weinachtsbaum" (Weinachten/Baum).
I remember from my linguistics studies that the terminus technicus for
these is "Fugenmorph
> Von: Markus Jelsma
> We've done a lot of tests with the HyphenationCompoundWordTokenFilter
> using a from TeX generated FOP XML file for the Dutch language and
> have seen decent results. A bonus was that now some tokens can be
> stemmed properly because not all compounds are listed in the
> dic
> Von: Valeriy Felberg
> If you want that query "jacke" matches a document containing the word
> "windjacke" or "kinderjacke", you could use a custom update processor.
> This processor could search the indexed text for words matching the
> pattern ".*jacke" and inject the word "jacke" into an addi
> Given an input of "Windjacke" (probably "wind jacket" in English),
> I'd like the code that prepares the data for the index (tokenizer
> etc) to understand that this is a "Jacke" ("jacket") so that a
> query for "Jacke" would include the "Windjacke" document in its
> result set.
>
> It appears t