[XXE] Spellchecking derived words

Hussein Shafie Wed, 21 Jan 2009 17:25:37 +0100

Rudolf de Grijs wrote:
> 
> What I did notice is that for the dutch language there is a list with
> base words and a list with derived words.


There is no concept of derived words in our spell-checker. Our
spell-checker is designed to efficiently crunch flat word lists. Note
that a  word list may be really huge (*millions* of words).



> If I would check the German word
> /Anmeldungsgegenstand /with the default included dictionary then this
> word is not recognized. But the words /Anmeldung /and /Gegenstand /are
> known.

That's right. You need to add words like Anmeldungsgegenstand to your
word list, because there is an "s" between Anmeldung and Gegenstand.

Note that you don't have to do that for ``true compound words''. German
example: In "Obama beginnt mit Krisengespr?chen", Krisengespr?chen is
found to be OK by our spell-checker though it has not been explicitly
added to the German word list from which the corresponding dictionary
comes from. However the German word list indeed contains: Krisen and
Gespr?chen.



> So I do get the feeling that a list is required for those (most
> frequently used) derived words (like I have included for the dutch
> dictionary). My question is: is my assumption correct or is their some
> algorithm that can perform the spellcheckon these derived words?

There is no such algorithm. Feel free to add all the derived words you
want. But no need to add ``true compound words'' and no need to add
words starting with common prefixes (e.g. auto). See "-prefixes
word_list" in http://www.xmlmind.com/_dictbuilder/doc/using_builder.html.

See also "%compoundmin length" in
http://www.xmlmind.com/_dictbuilder/doc/hints_file.html



---
PS: For the reasons explained before, our spell-checker does not detect
any error for Anmeldunggegenstand (without the "s"). The German language
is very, very, difficult to spell-check. The fact that our spell-checker
has severe flaws in the case of German is one of the reasons why we have
retired our spell-checker as a commercial product.

[XXE] Spellchecking derived words

Reply via email to