FW: extracting words

2001-02-11 Thread Mike Lischke
Yes, we have had it for a long time; no, nobody has solved it entirely; and yes, this approach is wrong. Breaking a string into words may require a thorough understanding of the vocabulary and grammar of the language, and even that may not be enough. But how can we then ever have a

Re: FW: extracting words

2001-02-11 Thread Tex Texin
If you are willing to give up precision, then you can use heuristics. The grossest heuristics are not really word breaking at all, but give users that do not know the language a compatible way of working with the text. For example, some software have extended their western European language

[OT] RE: FW: extracting words

2001-02-11 Thread Thomas Chan
On Sun, 11 Feb 2001, Mike Lischke wrote: If you are willing to give up precision, then you can use heuristics. It's ugly but perhaps ok for a simple editor. You can improve the precision with better heuristics and more data, so you get to decide how much is good enough... So using

Re: [OT] RE: FW: extracting words

2001-02-11 Thread Jungshik Shin
On Sun, 11 Feb 2001, Thomas Chan wrote: On Sun, 11 Feb 2001, Mike Lischke wrote: If you are willing to give up precision, then you can use heuristics. It's ugly but perhaps ok for a simple editor. You can improve the precision with better heuristics and more data, so you get to