Asmus Freytag is the one to talk to; he can look into this. Mark ----- Original Message ----- From: "Jungshik Shin" <[EMAIL PROTECTED]> To: "Unicode List" <[EMAIL PROTECTED]> Sent: Monday, February 12, 2001 13:33 Subject: Korean linebreking and UTR14(was Re: extracting words) > > > > On Sun, 11 Feb 2001, Mark Davis wrote: > > MD> Please read TUS Chapter 5 and the Linebreak TR before proceeding, as I > MD> recommended in my last message. The Unicode standard is online, as is the > MD> TR. Both can be found by going to www.unicode.org, and selecting the right > MD> topic. The TR in particular discusses the recommended approach to line break > MD> in great detail. > > As I wrote when TUS 3.0 came out, I cannot help wondering where the idea > that leads to the following in the TR on line breaking (and what's written > about it in Chap 5o of TUS 3.0) came from. > > UTR14> Korean may alternately use a space-based (style 1) instead of the > UTR14> style 2 context analysis. > > UTR14> 1. Korean uses either implicit breaking around > UTR14> Hangul and ideographs or uses spaces. Reference [1] shows > UTR14> how this can be elegantly handled by the second or third > UTR14> method. Only the intersection of ID/ID, AL/ID and ID/AL > UTR14> are affected. For alphabetic style line breaking, breaks > UTR14> for these four cases require space, for ideographic style > UTR14> line breaking, these four cases don't require spaces. > > where style 1 and style2 are defined as > > UTR14> 1. Western (spaces and hyphens are used to determine breaks) > UTR14> 2. East Asian (lines can break anywhere, unless prohibited) > > > Let me make it clear that virtually NO books published in Korean uses > space-based (style 1) line breaking rule. Style 2 line breaking rule > is *exclusively* used for modern Korean text no matter what some broken > word processors for Korean offer as an alternative to style 2 and what > some web browsers (e.g. Netscape 4.x. Mozilla fixed this problem) do. > > I'm very alarmed to find this 'misinformation' crept into the UTS and > UTR14 (now UAX #14). It would be nice if somebody in charge could get > this straightened. > > Regards, > > Jungshik Shin >