RE: extracting words

jarkko . hietaniemi Mon, 12 Feb 2001 08:38:59 -0800
> > - line break (wrapping lines on the screen)
> > - word break (for selection)
> > - word/root extraction (for search)
> 
> I recognize that the second and third case are really 
> difficult to handle.

Root extraction is decidecly non-trivial and a highly language-specific
problem, even more so than word breaking, it's a messy linguistic problem
instead of a clean algoritmic problems.
To start with, the choice of the term "extraction" shows that one has not
understood the problem in all its g(l)ory: a more appropriate term would be
"finding", or maybe, "reducing" the root.

Also, I would add

- "syllablization" (is that a word?) as a third problem (for breaking words
more nicely into lines), it would rank in difficulty somewhere between word
breaking and root extraction.

> But for word wrapping I assume line 
> breaking is sufficient. But when I don't have spaces to use 
> for wrapping and/or don't know whether the actual text part 
> uses spaces at all (what about exotic languages like Ogham or 
> Anglo-saxon?) then how can I go to implement word wrapping? 
> Simply do it character by character?
RE: extracting words

Reply via email to