Re: [cldr-dev] Re: Questions on Chinese collation, stroke

Stephan Stiller Fri, 22 Jun 2012 19:49:39 -0700

Dear Matt,

I think those tasks would take a quite a bit of work, because (1) thethree orders you are mentioning are all mathematically underspecifiedand (2) they're partial orders even when considering only what you'dnormally consider the respective target domains (certain subsets of CJKV).

I'm sure many or most people reading this know this, but the question iswhich committee would get rid of the underspecification (also, accordingto what principles?), fine-tune the respective target domains, and such.(Perhaps the IICore people have done parts of the footwork already?)


Stephan

On 6/22/2012 5:05 PM, Matt Ma wrote:

Entered ticket #4949 for Simplified Chinese, stroke order.

Thanks,
Matt

On Fri, Jun 22, 2012 at 12:55 PM, Mark Davis ☕ <m...@macchiato.com> wrote:

There are no current plans to do that. If you want to present a case for
adding additional collation sequences to CLDR, please start the process by
filing a bug at http://unicode.org/cldr/trac/newticket

________________________________
Mark

— Il meglio è l’inimico del bene —



On Fri, Jun 22, 2012 at 11:05 AM, Matt Ma <matt.ma.um...@gmail.com> wrote:

Thanks all for clarification. Are there any plans to provider the
following collations in CLDR?

  1. Simplified Chinese, stroke order, based on 现代汉语通用字笔顺规范 (PRC-China
modern Chinese commonly used characters standard stroke orders,
mentioned in http://en.wikipedia.org/wiki/Stroke_order).

  2. Simplified Chinese, radical order

  3. Traditional Chinese, radical order

Thanks,
Matt

On Sat, Jun 9, 2012 at 1:02 AM, Katsuhiko Momoi <katmo...@gmail.com>
wrote:

Unihan-6.2.0d1/Unihan_DictionaryLikeData.txt is lacking the Traditional
Chinese stroke count. Currently it only lists:

U+8303 kTotalStrokes 8

I filed a ticket for a review:

http://unicode.org/cldr/trac/ticket/4898

(I understand that we are supposed to list the Traditional stroke count
after the Simplified one delimited by a {sp}.

As a general observation, I glanced through a number of kTotalStrokes
entries for strokes 8 and 9. I did not find a single entry that listed 2
stroke counts. This seems odd as there should be other stroke count
differences between Simplified and Traditional Chinese. I suspect that
this
is an area needing more than one correction -- it would be better to do
a
systematic review.

- Kat

On Fri, Jun 8, 2012 at 3:44 PM, Mark Davis ☕ <m...@macchiato.com> wrote:

It can supply the data for both, if they differ. That's done with two
fields.

However, in this case there is only one value; if that's incorrect for
this character someone should file feedback.

________________________________
Mark

— Il meglio è l’inimico del bene —



On Fri, Jun 8, 2012 at 2:41 PM, Claire Ho (賀靜蘭) <clair...@google.com>
wrote:

Check the tr38, from the description of kTotalStrokes, it provides
stroke
count data for simplified Chinese and traditional Chinese.
Then, I don't have concern.

Thanks!
Claire.


On Fri, Jun 8, 2012 at 2:33 PM, Claire Ho (賀靜蘭) <clair...@google.com>
wrote:

Hi Mark

There you find the line:
U+8303 kTotalStrokes 8

In Traditional Chinese, U+8303 has 9 strokes as Matt mentioned in the
email.

The radical "++" is counted as 4 strokes. I think there are several
radicals have the same issue, different stroke counts, between
simplified
Chinese and traditional Chinese.

Claire.


On Thu, Jun 7, 2012 at 5:54 PM, Mark Davis ☕ <m...@macchiato.com>
wrote:

On Thu, Jun 7, 2012 at 4:28 PM, Matt Ma <matt.ma.um...@gmail.com>
wrote:

Hi,

I have two questions regarding the collation sequence defined in
zh.xml, CLDR 21.0

1. Why is U+8303 (范)  counted as 9 strokes instead of 8 for
<collation
type="stroke">? As a reference, U+59DA (姚) is counted as 9 strokes
but
sorted before U+8303 (范).


CLDR now gets the stroke collation data from the kTotalStokes
property.
The values for that are in the
file Unihan/Unihan_DictionaryLikeData.txt in
the Unicode Character Database.

There you find the line:

U+8303 kTotalStrokes 8

If that is in error, or if there is any other error in
the kTotalStrokes data, then please report the correct value
according to
http://www.unicode.org/review/pri230/ so that it can be fixed.

As a related matter, CLDR now gets the pinyin collation data from
the kMandarin property. The values for that are in the
file Unihan/Unihan_Readings.txt in the Unicode Character Database.
So if any
of those are in error, they should also be reported as
per http://www.unicode.org/review/pri230/ .

The beta data is
in ftp://www.unicode.org/Public/6.2.0/ucd/. Currently
in ftp://www.unicode.org/Public/6.2.0/ucd/Unihan-6.2.0d1.zip
but as the beta proceeds, the d1 might change to d2,d3...


2. Does the collation type, stroke, apply to both Simplified and
Traditional Chinese, as I do not see anything defined in
zh_Hant.xml
under "stroke"?


Let me look at that.


Thanks,
Matt



--
Katsuhiko Momoi <katmo...@gmail.com>

Re: [cldr-dev] Re: Questions on Chinese collation, stroke

Reply via email to