Re: length of text by different languages

2003-03-08 Thread Jon Babcock
Yung-Fong Tang wrote: Ram Viswanadha wrote: There is also some information at http://oss.software.ibm.com/icu/docs/papers/binary_ordered_compression_for_unicode.html#Test_Results Not sure if this is what you are looking for. thanks. not really. I am not look into the ratio caused by encoding.

Re: length of text by different languages

2003-03-08 Thread Jon Babcock
Correction. I just checked my old Japanese moji(character)-to-English calculations and I think 1.8-2.8 to 1 is a more realistic ratio than the 2.3-3.2 I mentioned. (Comparing this to the 1.4-1.8 to 1 that I use for Chinese would indicate that Chinese is slighlty more efficient than Japanese.)

Re: length of text by different languages

2003-03-07 Thread Yung-Fong Tang
Ram Viswanadha wrote: There is also some information at http://oss.software.ibm.com/icu/docs/papers/binary_ordered_compression_for_unicode.html#Test_Results Not sure if this is what you are looking for. thanks. not really. I am not look into the

Re: length of text by different languages

2003-03-06 Thread Doug Ewell
Yung-Fong Tang ftang at netscape dot com wrote: I remember there were some study to show although UTF-8 encode each Japanese/Chinese characters in 3 bytes, Japanese/Chinese usually use LESS characters in writting to communicate information than alphabetic base langauges. Any one can point

Re: length of text by different languages

2003-03-06 Thread Jon Babcock
Yung-Fong Tang wrote: I remember there were some study to show although UTF-8 encode each Japanese/Chinese characters in 3 bytes, Japanese/Chinese usually use LESS characters in writting to communicate information than alphabetic base langauges. For my commercial Japanese-to-English translation

Re: length of text by different languages

2003-03-06 Thread Yung-Fong Tang
Francois Yergeau wrote: [EMAIL PROTECTED] wrote: I remember there were some study to show although UTF-8 encode each Japanese/Chinese characters in 3 bytes, Japanese/Chinese usually use LESS characters in writting to communicate information than alphabetic base langauges.

Re: length of text by different languages

2003-03-06 Thread Yung-Fong Tang
Francois Yergeau wrote: http://www.unicode.org/iuc/iuc9/Friday2.html#b3 Reuters Compression Scheme for Unicode (RCSU) Misha Wolf Unfortunately, no information about Germany or Japanese. :( It only have Chinese, Frasi, Urdu, Russian, Arabic, Hindi, Korean , Creole, Thai, French, Czech,

Re: length of text by different languages

2003-03-06 Thread Yung-Fong Tang
thanks, everyone. But I want to point out the punct and itself should also be consider in your future caculation. Japanese and Chinese, Thai do not use between word, and Latin based (or Greek, Koeran,Cyrillic, Arabic, Armenian Georgian, etc) does use and when used for estimate size,

Re: length of text by different languages

2003-03-06 Thread Ram Viswanadha
Cc: [EMAIL PROTECTED] Sent: Thursday, March 06, 2003 2:33 PM Subject: Re: length of text by different languages Francois Yergeau wrote: [EMAIL PROTECTED] wrote: I remember there were some study to show although UTF-8 encode each Japanese/Chinese characters in 3 bytes

length of text by different languages

2003-03-05 Thread Yung-Fong Tang
I remember there were some study to show although UTF-8 encode each Japanese/Chinese characters in 3 bytes, Japanese/Chinese usually use LESS characters in writting to communicate information than alphabetic base langauges. Any one can point to me such research? Martin, do you have some paper

RE: length of text by different languages

2003-03-05 Thread Francois Yergeau
[EMAIL PROTECTED] wrote: I remember there were some study to show although UTF-8 encode each Japanese/Chinese characters in 3 bytes, Japanese/Chinese usually use LESS characters in writting to communicate information than alphabetic base langauges. Any one can point to me such research?