Yung-Fong Tang wrote:
Ram Viswanadha wrote:
There is also some information at
http://oss.software.ibm.com/icu/docs/papers/binary_ordered_compression_for_unicode.html#Test_Results
Not sure if this is what you are looking for.
thanks. not really. I am not look into the ratio caused by encoding.
Correction.
I just checked my old Japanese moji(character)-to-English
calculations and I think 1.8-2.8 to 1 is a more realistic ratio
than the 2.3-3.2 I mentioned. (Comparing this to the 1.4-1.8 to
1 that I use for Chinese would indicate that Chinese is slighlty
more efficient than Japanese.)
Ram Viswanadha wrote:
There is also some information at
http://oss.software.ibm.com/icu/docs/papers/binary_ordered_compression_for_unicode.html#Test_Results
Not sure if this is what you are looking
for.
thanks. not really. I am not look into the
Yung-Fong Tang ftang at netscape dot com wrote:
I remember there were some study to show although UTF-8 encode each
Japanese/Chinese characters in 3 bytes, Japanese/Chinese usually use
LESS characters in writting to communicate information than alphabetic
base langauges.
Any one can point
Yung-Fong Tang wrote:
I remember there were some study to show although UTF-8 encode each
Japanese/Chinese characters in 3 bytes, Japanese/Chinese usually use
LESS characters in writting to communicate information than alphabetic
base langauges.
For my commercial Japanese-to-English translation
Francois Yergeau wrote:
[EMAIL PROTECTED] wrote:
I remember there were some study to show although UTF-8 encode each
Japanese/Chinese characters in 3 bytes, Japanese/Chinese usually use
LESS characters in writting to communicate information than
alphabetic base langauges.
Francois Yergeau wrote:
http://www.unicode.org/iuc/iuc9/Friday2.html#b3
Reuters Compression Scheme for Unicode (RCSU)
Misha Wolf
Unfortunately, no information about Germany or Japanese. :(
It only have Chinese, Frasi, Urdu, Russian, Arabic, Hindi, Korean ,
Creole, Thai, French, Czech,
thanks, everyone. But I want to point out the punct and itself
should also be consider in your future caculation. Japanese and Chinese,
Thai do not use between word, and Latin based (or Greek,
Koeran,Cyrillic, Arabic, Armenian Georgian, etc) does use and when
used for estimate size,
Cc: [EMAIL PROTECTED]
Sent: Thursday, March 06, 2003 2:33
PM
Subject: Re: length of text by different
languages
Francois Yergeau wrote:
[EMAIL PROTECTED] wrote:
I remember there were some study to show although UTF-8 encode each
Japanese/Chinese characters in 3 bytes
I remember there were some study to show although UTF-8 encode each
Japanese/Chinese characters in 3 bytes, Japanese/Chinese usually use
LESS characters in writting to communicate information than alphabetic
base langauges.
Any one can point to me such research? Martin, do you have some paper
[EMAIL PROTECTED] wrote:
I remember there were some study to show although UTF-8 encode each
Japanese/Chinese characters in 3 bytes, Japanese/Chinese usually use
LESS characters in writting to communicate information than
alphabetic base langauges.
Any one can point to me such research?
11 matches
Mail list logo