> -----Original Message-----
> From: Vladimir Ivanov
[mailto:[EMAIL PROTECTED]]
>
Sent: Tuesday, March 11, 2003 6:22 AM
> To: Magda Danish (Unicode)
>
Subject: ZWNJ & Persian Collation
>
>
> Dear
Magda,
>
> Excuse for bothering you again, but my message was
rejected
> by some server
> on its way to [EMAIL PROTECTED] . May
I ask you to publish
> my question
> below? Thank you,
Vladimir.
>
>
>
> Sorting Persian words with a utility,
based on version 3.1.1
> of tailored
> Allkeys Table http://www.unicode.org/reports/tr10/#AllKeys,
>
I’ve encountered
> a problem that affects the lexicographical order of the
words in a
> dictionary.
>
> To my mind, ZWNJ (zero width
non-joiner) U+200C (also found
> among MS Word
> Special
Characters/No-width Optional Break), was invented to prevent
> connection
of Arabic letters within a word.
>
> It is used in Persian to show
the morphemic boundary in
> compound words like
> خانهداری xānedāri
‘household’. The latter consists of the
> word خانه xāne
> ‘house’ +
verb stem دار dār ‘hold’ + suffix ی ‘i’. It can be
>
transliterated
> like xāne + ZWNJ + dāri. There are thousands words
with
> similar structure in
> Persian, Dari, Tajik and neighboring
languages.
>
> It is clearly seen that there are letters on both
sides of
> ZWNJ within the
> word boundaries. Placing ZWNJ on an
edge of the word doesn’t
> make sense in
> Persian. From this point
of view ZWNJ should be treated as a special
> character rather than a
delimiter.
>
> But in Allkeys Table it is placed on line #68 well
before
> other popular
> delimiters: HORIZONTAL TABULATION line
#192,
>
> LINE FEED line #193,
>
> CARRIAGE RETURN line
#196,
>
> SPACE line #197 etc.
>
> Such an ordering
gives wrong sorting results for Persian dictionaries:
> compound words
like خانهداری xānedāri ‘household’ appear in
> the list before
>
their components like خانه xāne ‘house’.
>
> I’ve sold this problem
for myself by placing ZWNJ somewhere after
> delimiters, but what are the
theoretical reasons for putting
> it before them?
> In order to get
what? In what languages?
>
> Is it a Persian specific problem or a
global one? Are there
> languages where
> ZWNJ marks a word
boundary?
>
> By the way, the sorting algorithm built into MS
Windows puts
> compound words
> with ZWNJ AFTER their simple
components. So in this respect
> it acts on the
> principles
different from Allkeys Table.
>
>
>
> Thank
you,
>
> Vladimir Ivanov
>
>
Title: Message
Please make sure to copy Vladimir[EMAIL PROTECTED] on your
reply.
Thanks,
Magda
- Re: FW: ZWNJ & Persian Collation Magda Danish \(Unicode\)
- Re: FW: ZWNJ & Persian Collation Markus Scherer
- Re: ZWNJ & Persian Collation Roozbeh Pournader
- Re: ZWNJ & Persian Collation Markus Scherer
- "Deterministic Sorting"... Mark Davis
- Re: "Deterministic Sor... Mark Davis