A long time ago I prototyped a word uncompounder for Dutch. Though it worked, it was far from elegant and supporting only Dutch.
Earlier this week I found a more elegant soution, able to uncompound words like 'langetermijnplanning' into 'lange termijn planning'. In Dutch there are 4 possible compounding insertions: none (word+word), an s (word+s+word), a dash (word+-+word) and the combination (word+s-+word). The number of parts in the compound is not limited in any way (theoretically). Generally, uncompounding works well with parts of at least 5 chars. Shorter parts lead to wrongly uncompounded words. Some parts of shorter length are still safe to use though (e.g. jazz). Now my question: What about other languages? - Is your language compounding or not? * Are there special situations when compounding, like changing the letters on the concatenation point? - which cancatenation insertions are there for your language? - Which part of the compound is sematically the essence of the word ( langetermijnplanning, long term plan, is mostly a plan, term and long are specifiers) When I know a bit more, I could try to adjust the prototype code to support multiple languages by design. Thanks in advance, Ruud ------------------------------------------------------------------------------ Free Next-Gen Firewall Hardware Offer Buy your Sophos next-gen firewall before the end March 2013 and get the hardware for free! Learn more. http://p.sf.net/sfu/sophos-d2d-feb _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel